Rate limits#

MCP call_tool invocations are rate-limited per api key and per brand. The limiter is fail-closed: if its backing store is unavailable, calls are denied rather than allowed through. Over-limit calls return a retryable error envelope.

Only call_tool counts. list_tools and any auth / permission rejection are not rate-limited and do not consume quota.

Limits#

There are two enforcement dimensions plus an optional cost ceiling.

Dimension	Default limit	Window	Notes
Per api key	60 calls	60 s (rolling)	Each key has its own counter.
Per brand	300 calls	60 s (rolling)	Aggregate across all of a brand's keys.
Cost ceiling (optional)	off by default	per brand window	A per-brand cost-unit cap, enforced only when configured.

Defaults vs. plan quotas. The numbers above are the engine defaults. Your plan may set different effective values — all quota numbers are configuration, not hard-coded constants. On the Growth plan the published targets are 60 calls/min/key (with burst headroom), 5 active scoped keys, and 50,000 included call_tool executions per month; Enterprise is negotiated. The monthly include is a soft limit (you are warned and offered an upgrade near exhaustion, not hard-cut mid-cycle).

What you get when limited#

An over-limit call returns the standard error envelope with:

isError: true
error_class: "retryable"
a generic sanitized_error, e.g. *"Rate limit exceeded. Please wait before

sending more requests."*

No internal counters, brand ids, or key material appear in the response.

json

{
  "result": {
    "content": [
      { "type": "text", "text": "Rate limit exceeded. Please wait before sending more requests." }
    ],
    "isError": true
  }
}

Internally this is a retryable envelope that carries a retry_hint (retry_after_ms, max_attempts, backoff: "fixed"). Treat it like any other retryable error: back off and retry.

Rate-limit headers — not in v1#

There are no HTTP rate-limit headers in v1. The v1 GA transport is MCP stdio, which is a JSON-RPC stream with no HTTP response headers — so there is no X-RateLimit-Limit / X-RateLimit-Remaining / Retry-After header to read. Rate-limit state is communicated in-band via the retryable error envelope and its retry_hint, not via headers. HTTP-style rate-limit headers will become relevant only when the public HTTP/SSE gateway (mcp.qaqnuz.uz/v1) ships — that gateway is not yet deployed (it depends on platform GATE-5 infrastructure). This page will be updated with the concrete header contract when the gateway is announced.

Designing within the limits#

Honor the retry hint. On a retryable result, wait retry_after_ms (with

jitter) before retrying; do not hammer.

Stay under the per-key window. Default is 60 calls/min/key — spread bursts,

and split heavy workloads across the keys your plan allows.

Mind the brand aggregate. All of a brand's keys share the per-brand window

(default 300 calls/min). Many busy keys can collectively hit it.

Expect fail-closed behavior. During a backend disruption the limiter denies

rather than allowing unmetered traffic; a transient burst of retryable errors can mean the limiter is degraded, not that you are actually over quota — back off the same way.

Cache reads. Avoid re-calling read tools for data you already have.