GitHub
Reference

Rate limits

The core API is unmetered; the LLM gateway runs a 120-req/min token bucket per API key. Standard X-RateLimit-* headers are not emitted.

Most ProxifAI endpoints are not rate-limited in OSS — Postgres is the only backpressure. The exception is the LLM gateway, which runs a token bucket per API key to prevent runaway agent loops from saturating provider quotas. There’s no X-RateLimit-* header surface; the gateway returns 429 with a body when the bucket is empty and reads as 200 otherwise.

Where rate limits exist

SurfaceLimitConfigurableEnforced by
Core REST APINone
LLM gateway120 req/min, burst 20, per API keyYes — see belowinternal/llmgateway/middleware/ratelimit.go
Inbound webhooksNone at platform levelProvider-specific (e.g. Slack signing rate)
MCP serverNone
Git protocolNonePostgres-bound

If you’re rate-limited on a POST /api/v1/issues storm, the bottleneck is your DB, not a middleware. Right answer is to scale Postgres or batch differently — there’s no platform knob to turn.

LLM gateway specifics

Defaults configured in internal/llmgateway/embedded/embedded.go:

ratelimit.New(120, 20)   // 120 requests/minute, burst 20

The bucket is keyed on the API key (or pfai_<execID> for workflow tokens) — every distinct key gets its own bucket. When a token is rejected, the gateway returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{"error":{"message":"rate limit exceeded","type":"rate_limit_error"}}

There’s no Retry-After header by default. Back off ~500 ms and retry; the bucket refills at 2 tokens/sec.

Configurable per-key limits

Beyond the gateway-wide default, you can set finer-grained limits via the gateway management API:

pfai gateway rate-limits set \
  --scope user --user [email protected] \
  --requests-per-min 60 --burst 10
ScopeGranularity
orgWhole org’s traffic share (additive on top of gateway-wide)
teamSpecific team
projectSpecific project
userSpecific user — useful for individual agents/contributors
keySpecific PAT

Limits compose multiplicatively — every request must clear every applicable bucket. Tightest scope wins.

REST surface (Loop 8: AI Gateway docs):

Method · PathPurpose
GET /api/v1/gateway/rate-limitsList all configured limits
POST /api/v1/gateway/rate-limitsCreate — {scope, scopeId, requestsPerMin, burst}
DELETE /api/v1/gateway/rate-limits/{id}Remove

Headers

The gateway does not emit standard X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset headers in OSS. Track quota client-side (you know your key’s limit) or query GET /api/v1/gateway/usage/timeline for retrospective analysis.

CORS allows X-Request-ID and Idempotency-Key for instances API, but neither relates to rate-limit signaling.

Best practices

  • Cache reads. Issues, projects, workflows rarely change; cache the response with the updatedAt timestamp and refetch only on staleness.
  • Use streaming for chat. SSE streams from /api/v1/llm/v1/chat/completions count as one request — much better than chunked polling.
  • Batch where supported. DELETE /api/v1/issues (no ID) accepts a body of IDs to delete in bulk. Saves N round-trips.
  • Use pfai wait for async. Polling a workflow execution every 100 ms wastes both client and server cycles. pfai wait and the SSE event stream do the right thing without a timer.
  • Respect circuit breakers. When the gateway returns 503 because every provider for a model has tripped, don’t retry the same model — try a different one or back off.

See also