Rate limits#
MCP call_tool invocations are rate-limited per api key and per brand. The limiter is fail-closed: if its backing store is unavailable, calls are denied rather than allowed through. Over-limit calls return a retryable error envelope.
Onlycall_toolcounts.list_toolsand any auth / permission rejection are not rate-limited and do not consume quota.
Limits#
There are two enforcement dimensions plus an optional cost ceiling.
| Dimension | Default limit | Window | Notes |
|---|---|---|---|
| Per api key | 60 calls | 60 s (rolling) | Each key has its own counter. |
| Per brand | 300 calls | 60 s (rolling) | Aggregate across all of a brand's keys. |
| Cost ceiling (optional) | off by default | per brand window | A per-brand cost-unit cap, enforced only when configured. |
Defaults vs. plan quotas. The numbers above are the engine defaults. Your plan may set different effective values — all quota numbers are configuration, not hard-coded constants. On the Growth plan the published targets are 60 calls/min/key (with burst headroom), 5 active scoped keys, and 50,000 included call_tool executions per month; Enterprise is negotiated. The monthly include is a soft limit (you are warned and offered an upgrade near exhaustion, not hard-cut mid-cycle).
What you get when limited#
An over-limit call returns the standard error envelope with:
isError: trueerror_class: "retryable"- a generic
sanitized_error, e.g. *"Rate limit exceeded. Please wait before
sending more requests."*
No internal counters, brand ids, or key material appear in the response.
{
"result": {
"content": [
{ "type": "text", "text": "Rate limit exceeded. Please wait before sending more requests." }
],
"isError": true
}
}Internally this is a retryable envelope that carries a retry_hint (retry_after_ms, max_attempts, backoff: "fixed"). Treat it like any other retryable error: back off and retry.
Rate-limit headers — not in v1#
There are no HTTP rate-limit headers in v1. The v1 GA transport is MCP stdio, which is a JSON-RPC stream with no HTTP response headers — so there is noX-RateLimit-Limit/X-RateLimit-Remaining/Retry-Afterheader to read. Rate-limit state is communicated in-band via theretryableerror envelope and itsretry_hint, not via headers. HTTP-style rate-limit headers will become relevant only when the public HTTP/SSE gateway (mcp.qaqnuz.uz/v1) ships — that gateway is not yet deployed (it depends on platform GATE-5 infrastructure). This page will be updated with the concrete header contract when the gateway is announced.
Designing within the limits#
- Honor the retry hint. On a
retryableresult, waitretry_after_ms(with
jitter) before retrying; do not hammer.
- Stay under the per-key window. Default is 60 calls/min/key — spread bursts,
and split heavy workloads across the keys your plan allows.
- Mind the brand aggregate. All of a brand's keys share the per-brand window
(default 300 calls/min). Many busy keys can collectively hit it.
- Expect fail-closed behavior. During a backend disruption the limiter denies
rather than allowing unmetered traffic; a transient burst of retryable errors can mean the limiter is degraded, not that you are actually over quota — back off the same way.
- Cache reads. Avoid re-calling read tools for data you already have.