Billing & Usage
How LLM usage, credit enforcement, and cost reporting work in OSS vs Enterprise — and what's a no-op without account-api.
ProxifAI’s billing surface differs sharply between the OSS build and the Enterprise/SaaS distribution. The OSS build records token usage faithfully but doesn’t enforce credit budgets and reports $0 cost per call (no pricing tables registered). Enterprise wires up account-api for credit enforcement and registers a real cost calculator. This page explains both modes and how to add custom pricing in OSS.
What’s recorded vs what’s enforced
LLM call ┌── usage.Event published to NATS LLM_USAGE stream
│ │ (always, OSS or EE)
▼ │
gateway middleware ──► token count ─┤
│ └── pricing.CalculateCost
│ ├─ OSS: returns 0
│ └─ EE: current per-model rates
│
└── credit check (optional, before request)
├─ OSS without account-api: no-op (allows)
└─ EE/SaaS: ErrInsufficientCredits → 402
Net effect: in OSS, you get accurate token counts and a usage timeline, but the estimatedCostUsd field is always 0 and nothing prevents a runaway workflow from burning provider quota. In Enterprise/SaaS, the same surface enforces a USD-denominated credit balance against the LLM gateway middleware.
OSS mode
In a vanilla OSS deployment (no account-api configured), every gateway call:
- Authenticates the token (PAT or
pfai_<execID>) - Rate-limits at 120 req/min, burst 20 — see AI Gateway
- Routes to a provider, captures token counts on success
- Publishes a
usage.Eventto theLLM_USAGENATS JetStream stream - The
internal/llmusageconsumer writes one row per event into thegateway_usagetable
Reports are queryable:
| Method · Path | Returns |
|---|---|
GET /api/v1/gateway/usage | Period totals (input + output tokens, request count) |
GET /api/v1/gateway/usage/by-model | Per-model breakdown |
GET /api/v1/gateway/usage/by-workflow | Attribution to chat sessions, agent runs, workflow executions |
GET /api/v1/gateway/usage/timeline | Time-series for charts |
GET /api/v1/gateway/usage/log | Raw event log |
The Settings → AI Gateway page renders these. pfai gateway usage is the CLI surface.
Adding custom pricing in OSS
The internal/llmgateway/pricing package exposes a hook (pricing.go):
import "github.com/proxifai/proxifai-oss/internal/llmgateway/pricing"
func init() {
pricing.CostCalculator = func(model string, prompt, completion int) float64 {
rates := map[string]struct{ in, out float64 }{
"claude-sonnet-4.6": {3.00 / 1e6, 15.00 / 1e6}, // $/token
"gpt-4o": {2.50 / 1e6, 10.00 / 1e6},
// …
}
r, ok := rates[model]
if !ok {
return 0
}
return float64(prompt)*r.in + float64(completion)*r.out
}
}
Once registered, usage.Event.estimatedCostUsd populates with real USD per call, and the /gateway/usage* endpoints aggregate it. There’s no rate-table UI in OSS — the calculator is a Go function, not config.
Enterprise mode
When account-api is reachable (set ACCOUNT_API_URL or wire a credits.Client), the gateway adds a pre-request credit check middleware that calls CheckCredit(orgID) before dispatching. If the org has zero or negative balance and is_enforced is true, the gateway returns 402 Payment Required with {"error":{"message":"insufficient credits"}}.
Per-org credit state from the account-api (credits/client.go):
| Field | Type | Meaning |
|---|---|---|
balance_usd | float | Current balance |
max_drawdown_usd | float | Soft floor — alerts fire as the balance approaches this |
is_enforced | bool | If false, the org isn’t blocked even at zero balance |
workflow_cost_per_min_usd | float | Per-minute compute cost for workflow execution containers |
agent_cost_per_min_usd | float | Per-minute compute cost for agent execution containers |
Credit checks are cached for 5 seconds and protected by a circuit breaker (5 failures → 30-second open). The breaker fails open — if account-api is unreachable, requests proceed rather than being blocked. This is intentional: a credit-service outage shouldn’t take down the platform.
What deducts credits
In Enterprise:
| Source | Cost driver |
|---|---|
| LLM gateway calls | Token cost via pricing.CalculateCost (registered by EE) |
| Workflow execution time | workflow_cost_per_min_usd × container minutes |
| Agent execution time | agent_cost_per_min_usd × container minutes |
| Storage (artifacts, container volumes) | Depends on plan |
Compute cost is recorded by the instances reaper (internal/reaper) when containers terminate; it computes elapsed time and emits a usage event the credit service deducts against.
Usage tracking dimensions
Both OSS and Enterprise report:
| Dimension | Where |
|---|---|
| Per model | /gateway/usage/by-model |
| Per workflow / chat session / agent run | /gateway/usage/by-workflow (uses the pfai_<execID> token’s claim) |
| Per team / per project | Filterable on every endpoint |
| Per user | Available via executionId → workflow → owner joins |
| Daily / weekly / monthly | /gateway/usage/timeline?period=… |
Token counts split into:
promptTokens— input (prompt + system message + conversation history)completionTokens— generated outputcacheHit— bool; provider cache hits when supported (Anthropic prompt caching, OpenAI cached completions)
Budget alerts and rate limits
Two related but distinct mechanisms:
| Budget alerts | Rate limits | |
|---|---|---|
| What | Notify when usage crosses a threshold | Block requests above a per-period quota |
| Hard stop? | No — informational only | Yes — returns 429 |
| Configured at | /api/v1/gateway/budgets | /api/v1/gateway/rate-limits |
| Granularity | Org / team / project | Per API key + per provider |
For hard limits, layer rate limits on top of budget alerts. Budget alerts can ride either Slack (via integration) or the inbox.
REST endpoints
| Method · Path | Purpose |
|---|---|
GET /api/v1/gateway/usage | Period totals |
GET /api/v1/gateway/usage/by-model | Per-model breakdown |
GET /api/v1/gateway/usage/by-workflow | Attribution |
GET /api/v1/gateway/usage/timeline | Time series |
GET /api/v1/gateway/usage/log | Raw events |
GET · POST · DELETE /api/v1/gateway/rate-limits[/{id}] | Rate limit CRUD |
GET · POST · DELETE /api/v1/gateway/budgets[/{id}] | Budget alert CRUD |
pfai gateway usage, pfai gateway rate-limits, pfai gateway budgets wrap these.