Welcome

T2K REST API Documentation

In order to use the T2K REST interface, each request must be authenticated with the HTTP Basic Authentication protocol. The username and the password are the ones used to login in T2K.

A Java interface is avalable at https://github.com/imatesiu/RestT2K . Thanks to Giorgio Spagnolo of the Formal Methods and Tools Laboratory (FMT, ISTI, CNR) for his support.
ENDPOINT
METHOD
CONTENT-TYPE

/rest/new_corpus
POST
MULTIPART_CONTENT
Required arguments
  • corpus_file: the corpus file in utf-8 or a collection of documents in .zip format
  • language: allowed values are IT for Italian and GB for the English language
  • name: the name associated to the corpus
Response
          {
          "id":1,
          "part_of_speech":{
          "analysis_file":null,
          "id":1,
          "created":"2015-02-13T11:49:04.949Z",
          "modified":"2015-02-13T11:49:04.949Z",
          "status":"defined"
          },
          "term_extraction":{
          "analysis_file":null,
          "id":1,
          "created":"2015-02-13T11:49:04.949Z",
          "modified":"2015-02-13T11:49:04.949Z",
          "status":"defined",
          "term_extraction_configuration":null
          },
          "term_extraction_indexer":{
          "conll_file":null,
          "id":1,
          "created":"2015-02-13T11:49:04.951Z",
          "modified":"2015-02-13T11:49:04.951Z",
          "status":"defined"
          },
          "created":"2015-02-13T11:49:04.948Z",
          "modified":"2015-02-13T11:49:04.966Z",
          "name":"TEST",
          "language":"IT",
          "status":"defined",
          "user":1,
          "corpus_file":"t2k_uploads/Costituzione_small_25_1.txt"
          }
        
The response will contain the corpus REST representation. In particular:
  • an id is assigned
  • the status of the corpus is defined, meaning that the corpus is ready to be analyzed. During the execution of async operations ( part of speech tagging, term extraction ), the status of the corpus is running: this means that T2K is performing an operation on the corpus. When an operation on the corpus is performed succesfully, the status of the corpus will be successful. In case of errors, the status of the corpus will be failed
/rest/corpus_list
GET
Response
[
  {
    "id": 1,
    "part_of_speech": {
      "analysis_file": null,
      "id": 1,
      "created": "2015-02-16T14:36:20.958Z",
      "modified": "2015-02-16T14:36:20.958Z",
      "status": "defined"
    },
    "term_extraction": {
      "analysis_file": null,
      "id": 1,
      "created": "2015-02-16T14:36:20.958Z",
      "modified": "2015-02-16T14:36:20.958Z",
      "status": "defined",
      "term_extraction_configuration": null
    },
    "term_extraction_indexer": {
      "conll_file": null,
      "id": 1,
      "created": "2015-02-16T14:36:20.960Z",
      "modified": "2015-02-16T14:36:20.960Z",
      "status": "defined"
    },
    "created": "2015-02-16T14:36:20.957Z",
    "modified": "2015-02-16T14:36:20.975Z",
    "name": "TEST",
    "language": "IT",
    "status": "defined",
    "user": 1,
    "corpus_file": "t2k_uploads/Costituzione_small_500_1.txt"
  }
]

        
The response will contain the list of corpus belonging to the connected user.
/rest/corpus/{id}/execute/{action}
GET
Required arguments
  • id: the id of the corpus
  • action: the action to be performed on the corpus. Allowed values are:
    • part_of_speech
    • term_extraction
    • term_extraction_indexer
    • delete
    Notice that part_of_speech, term_extraction term_extraction_indexer are executed asynchronously, meanwhile delete is a sync operation.
Response: empty response with status code 200.