Ratelimits

To ensure fair use and maintain high performance across all customers, Doti’s API enforces rate limits. This guide explains our policies and how to design resilient applications around them.

We enforce three types of rate limits:

  1. Search API Rate Limits – Controls how many search requests can be made per minute.

  2. Request Token Limits – Controls the size of an individual request payload.

  3. Documents API Rate Limits – Controls ingestion operations with a points-based system.


1. Search API Rate Limits

Doti applies a fixed-window rate limiting algorithm on search endpoints to manage usage.

Default Policy

If the rate limit is exceeded, the API will respond with:

HTTP 429 Too Many Requests

{
  "error": "Rate limit exceeded",
  "message": "Too many search requests. Please try again later.",
  "retryAfter": 59,
  "remaining": 0,
  "capacity": 100
}

Response Headers

  • X-RateLimit-Remaining: Requests left in the current window

  • X-RateLimit-Capacity: Total allowed requests per window

  • X-RateLimit-RetryAfter: Time (in seconds) before retrying


2. Request Token Limits

To ensure optimal performance and avoid heavy payloads, Doti limits each API call based on token count (tokens = chunks of text).

Token Limit Policy

  • Maximum tokens per request: 40,000

If a request exceeds the limit, it will return an error:

Maximum tokens per request is 40000. Your messages resulted in 41234 tokens. Please reduce the length of the messages.

3. Documents API Rate Limits

The Documents API uses a points-based rate limiting system to balance efficiency and throughput for ingestion workloads.

  • Single Document Endpoint: 1 point per request

  • Batch Endpoint: 10 points per request

Example (with 100 points/minute limit)

  • 100 single-document requests, OR

  • 10 batch requests, OR

  • A mix (e.g., 50 single + 5 batch).

This system accounts for the higher processing load of batch ingestion.


Best Practices

To ensure smooth usage and prevent interruptions:

Implement Retry Logic

Handle 429 Too Many Requests responses by respecting the X-RateLimit-RetryAfter header (or the retryAfter field in the response body). Use exponential backoff strategies to space retries.

Monitor Rate Usage

Use the X-RateLimit-Remaining header to throttle requests dynamically based on usage.

Cache Results

Avoid redundant queries by caching frequent responses where applicable.

Fail-Open Policy

In rare cases of internal issues, our rate-limiting infrastructure fails open — allowing requests to pass temporarily to avoid disruptions.


Need Higher Limits?

For high-throughput use cases or enterprise-scale automation, contact us to explore custom rate limits for your workspace.

Last updated

Was this helpful?