Rate limits — ProxifAI Docs

Most ProxifAI endpoints are not rate-limited in OSS — Postgres is the only backpressure. The exception is the LLM gateway, which runs a token bucket per API key to prevent runaway agent loops from saturating provider quotas. There’s no X-RateLimit-* header surface; the gateway returns 429 with a body when the bucket is empty and reads as 200 otherwise.

Where rate limits exist

Surface	Limit	Configurable	Enforced by
Core REST API	None	—	—
LLM gateway	120 req/min, burst 20, per API key	Yes — see below	`internal/llmgateway/middleware/ratelimit.go`
Inbound webhooks	None at platform level	—	Provider-specific (e.g. Slack signing rate)
MCP server	None	—	—
Git protocol	None	—	Postgres-bound

If you’re rate-limited on a POST /api/v1/issues storm, the bottleneck is your DB, not a middleware. Right answer is to scale Postgres or batch differently — there’s no platform knob to turn.

LLM gateway specifics

Defaults configured in internal/llmgateway/embedded/embedded.go:

ratelimit.New(120, 20)   // 120 requests/minute, burst 20

The bucket is keyed on the API key (or pfai_<execID> for workflow tokens) — every distinct key gets its own bucket. When a token is rejected, the gateway returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{"error":{"message":"rate limit exceeded","type":"rate_limit_error"}}

There’s no Retry-After header by default. Back off ~500 ms and retry; the bucket refills at 2 tokens/sec.

Configurable per-key limits

Beyond the gateway-wide default, you can set finer-grained limits via the gateway management API:

pfai gateway rate-limits set \
  --scope user --user [email protected] \
  --requests-per-min 60 --burst 10

Scope	Granularity
`org`	Whole org’s traffic share (additive on top of gateway-wide)
`team`	Specific team
`project`	Specific project
`user`	Specific user — useful for individual agents/contributors
`key`	Specific PAT

Limits compose multiplicatively — every request must clear every applicable bucket. Tightest scope wins.

REST surface (Loop 8: AI Gateway docs):

Method · Path	Purpose
`GET /api/v1/gateway/rate-limits`	List all configured limits
`POST /api/v1/gateway/rate-limits`	Create — `{scope, scopeId, requestsPerMin, burst}`
`DELETE /api/v1/gateway/rate-limits/{id}`	Remove

Headers

The gateway does not emit standard X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset headers in OSS. Track quota client-side (you know your key’s limit) or query GET /api/v1/gateway/usage/timeline for retrospective analysis.

CORS allows X-Request-ID and Idempotency-Key for instances API, but neither relates to rate-limit signaling.

Best practices

Cache reads. Issues, projects, workflows rarely change; cache the response with the updatedAt timestamp and refetch only on staleness.
Use streaming for chat. SSE streams from /api/v1/llm/v1/chat/completions count as one request — much better than chunked polling.
Batch where supported. DELETE /api/v1/issues (no ID) accepts a body of IDs to delete in bulk. Saves N round-trips.
Use pfai wait for async. Polling a workflow execution every 100 ms wastes both client and server cycles. pfai wait and the SSE event stream do the right thing without a timer.
Respect circuit breakers. When the gateway returns 503 because every provider for a model has tripped, don’t retry the same model — try a different one or back off.

Where rate limits exist

LLM gateway specifics

Configurable per-key limits

Headers

Best practices

See also