Ratelimits
To ensure fair use and maintain high performance across all customers, Doti’s API enforces rate limits. This guide explains our policies and how to design resilient applications around them.
We enforce three types of rate limits:
Search API Rate Limits – Controls how many search requests can be made per minute.
Request Token Limits – Controls the size of an individual request payload.
Documents API Rate Limits – Controls ingestion operations with a points-based system.
1. Search API Rate Limits
Doti applies a fixed-window rate limiting algorithm on search endpoints to manage usage.
Default Policy
If the rate limit is exceeded, the API will respond with:
HTTP 429 Too Many Requests
{
"error": "Rate limit exceeded",
"message": "Too many search requests. Please try again later.",
"retryAfter": 59,
"remaining": 0,
"capacity": 100
}
Response Headers
X-RateLimit-Remaining
: Requests left in the current windowX-RateLimit-Capacity
: Total allowed requests per windowX-RateLimit-RetryAfter
: Time (in seconds) before retrying
2. Request Token Limits
To ensure optimal performance and avoid heavy payloads, Doti limits each API call based on token count (tokens = chunks of text).
Token Limit Policy
Maximum tokens per request:
40,000
If a request exceeds the limit, it will return an error:
Maximum tokens per request is 40000. Your messages resulted in 41234 tokens. Please reduce the length of the messages.
3. Documents API Rate Limits
The Documents API uses a points-based rate limiting system to balance efficiency and throughput for ingestion workloads.
Single Document Endpoint: 1 point per request
Batch Endpoint: 10 points per request
Example (with 100 points/minute limit)
100 single-document requests, OR
10 batch requests, OR
A mix (e.g., 50 single + 5 batch).
This system accounts for the higher processing load of batch ingestion.
Best Practices
To ensure smooth usage and prevent interruptions:
Implement Retry Logic
Handle 429 Too Many Requests
responses by respecting the X-RateLimit-RetryAfter
header (or the retryAfter
field in the response body).
Use exponential backoff strategies to space retries.
Monitor Rate Usage
Use the X-RateLimit-Remaining
header to throttle requests dynamically based on usage.
Cache Results
Avoid redundant queries by caching frequent responses where applicable.
Fail-Open Policy
In rare cases of internal issues, our rate-limiting infrastructure fails open — allowing requests to pass temporarily to avoid disruptions.
Need Higher Limits?
For high-throughput use cases or enterprise-scale automation, contact us to explore custom rate limits for your workspace.
Last updated
Was this helpful?