Ingest API - Document Ingestion

The Documents API allows you to ingest documents into a custom collection. It provides two endpoints: one for single document ingestion and one for batch ingestion (multiple documents).


Authentication

Authentication

All requests to the Doti API require a Bearer token. Include it in the Authorization header:

Authorization: Bearer {YOUR_ACCESS_TOKEN}

❗ Tokens must be generated by the Doti team. They are not yet user-creatable via the Portal.


API Endpoints

Ingest a Single Document into a Custom Collection

Use this endpoint to add or update a single document in a specified custom collection. Best for quick ingestion with immediate feedback.

Endpoint:

POST https://api.doti.ai/api/v2/collections/:collectionId/documents

URL Parameters

Parameter
Type
Description

collectionId

string

The external identifier of the collection.

Request Body

The body must contain a single document. Max payload size is 10MB.

Field
Type
Required
Description

pageContent

string

Yes

The content of the document.

metadata

object

Yes

Metadata object for the document.

Metadata fields:

Field
Type
Required
Description

id

string

Yes

A unique identifier for the document.

url

string

Yes

The document’s source URL.

urlDescription

string

Yes

A short description of the URL.

lastModified

string

Yes

Last modified timestamp (ISO 8601).

additionalData

object

Yes

Additional metadata (e.g., author).

score

number

No

Optional relevance score.

Example Request

{
  "pageContent": "This is the content of the document.",
  "metadata": {
    "id": "doc-1",
    "url": "http://example.com/doc-1",
    "urlDescription": "Document Example",
    "lastModified": "2023-01-01T12:00:00Z",
    "additionalData": { "author": "John Doe" }
  }
}

Example Responses

201 Created

{
  "message": "Document processed successfully",
  "success": true,
  "documentId": "doc-1"
}

400 Bad Request

{
  "message": "Document content is empty",
  "success": false,
  "documentId": "doc-1",
  "error": "Document content cannot be empty",
  "errorCode": "EMPTY_CONTENT"
}

409 Conflict

{
  "message": "Document id doc-1 already exists, please use a different id",
  "success": false,
  "documentId": "doc-1",
  "error": "Document already exists in another collection",
  "errorCode": "DOCUMENT_CONFLICT"
}

500 Internal Server Error

{
  "message": "Document processing failed due to an internal error",
  "success": false,
  "documentId": "doc-1",
  "error": "Internal processing error",
  "errorCode": "PROCESSING_ERROR"
}

Ingest Multiple Documents into a Custom Collection (Batch)

Use this endpoint to add or update multiple documents in a collection at once. Best for efficiency when handling bulk ingestion.

Endpoint:

POST https://api.doti.ai/api/v2/collections/:collectionId/documents/batch

URL Parameters

Parameter
Type
Description

collectionId

string

The external identifier of the collection.

Request Body

The body must contain an array of documents. Max 100 documents per request, max payload 10MB.

Field
Type
Required
Description

documents

object[]

Yes

Array of document objects.

Each document has the same structure as the single ingestion request.

Example Request

{
  "documents": [
    {
      "pageContent": "This is the content of the first document.",
      "metadata": {
        "id": "doc-1",
        "url": "http://example.com/doc-1",
        "urlDescription": "First Document",
        "lastModified": "2023-01-01T12:00:00Z",
        "additionalData": { "author": "John Doe" }
      }
    },
    {
      "pageContent": "This is the content of the second document.",
      "metadata": {
        "id": "doc-2",
        "url": "http://example.com/doc-2",
        "urlDescription": "Second Document",
        "lastModified": "2023-01-02T15:30:00Z",
        "additionalData": { "author": "Jane Smith" }
      }
    }
  ]
}

Example Responses

202 Accepted

{
  "message": "All documents processed successfully",
  "processedCount": 2,
  "faultyDocuments": []
}

207 Multi-Status

{
  "message": "1 documents processed successfully, 1 documents failed",
  "processedCount": 1,
  "faultyDocuments": [
    {
      "documentId": "doc-2",
      "error": "Document content is empty",
      "errorCode": "EMPTY_CONTENT",
      "metadata": {
        "url": "http://example.com/doc-2"
      }
    }
  ]
}

400 Bad Request

{
  "message": "0 documents processed successfully, 2 documents failed",
  "processedCount": 0,
  "faultyDocuments": [
    {
      "documentId": "doc-1",
      "error": "Document id doc-1 already exists, please use a different id",
      "errorCode": "DOCUMENT_CONFLICT",
      "metadata": {
        "existingCollections": ["other-collection-id"],
        "targetCollection": "internal-collection-id"
      }
    },
    {
      "documentId": "doc-2",
      "error": "Schema validation failed",
      "errorCode": "VALIDATION_ERROR",
      "metadata": {
        "validationErrors": []
      }
    }
  ]
}

Rate Limiting

The Documents API uses a points-based rate limiting system:

  • Single Document Endpoint: 1 point per request

  • Batch Endpoint: 10 points per request

Example (with 100 points/minute limit):

  • 100 single-document requests, OR

  • 10 batch requests, OR

  • A mix (e.g., 50 single + 5 batch).


When to Use Which Endpoint

  • Single Document (/documents) → Best for individual documents, simple error handling, immediate feedback.

  • Batch (/documents/batch) → Best for multiple documents, efficient API usage, but allows partial success scenarios.

Both endpoints ensure documents are validated, stored, and associated with the specified custom collection.

Last updated

Was this helpful?