Billing & Usage — ProxifAI Docs

ProxifAI’s billing surface differs sharply between the OSS build and the Enterprise/SaaS distribution. The OSS build records token usage faithfully but doesn’t enforce credit budgets and reports $0 cost per call (no pricing tables registered). Enterprise wires up account-api for credit enforcement and registers a real cost calculator. This page explains both modes and how to add custom pricing in OSS.

What’s recorded vs what’s enforced

LLM call                            ┌── usage.Event published to NATS LLM_USAGE stream
  │                                 │       (always, OSS or EE)
  ▼                                 │
gateway middleware ──► token count ─┤
                          │         └── pricing.CalculateCost
                          │                ├─ OSS: returns 0
                          │                └─ EE:  current per-model rates
                          │
                          └── credit check (optional, before request)
                                ├─ OSS without account-api: no-op (allows)
                                └─ EE/SaaS: ErrInsufficientCredits → 402

Net effect: in OSS, you get accurate token counts and a usage timeline, but the estimatedCostUsd field is always 0 and nothing prevents a runaway workflow from burning provider quota. In Enterprise/SaaS, the same surface enforces a USD-denominated credit balance against the LLM gateway middleware.

OSS mode

In a vanilla OSS deployment (no account-api configured), every gateway call:

Authenticates the token (PAT or pfai_<execID>)
Rate-limits at 120 req/min, burst 20 — see AI Gateway
Routes to a provider, captures token counts on success
Publishes a usage.Event to the LLM_USAGE NATS JetStream stream
The internal/llmusage consumer writes one row per event into the gateway_usage table

Reports are queryable:

Method · Path	Returns
`GET /api/v1/gateway/usage`	Period totals (input + output tokens, request count)
`GET /api/v1/gateway/usage/by-model`	Per-model breakdown
`GET /api/v1/gateway/usage/by-workflow`	Attribution to chat sessions, agent runs, workflow executions
`GET /api/v1/gateway/usage/timeline`	Time-series for charts
`GET /api/v1/gateway/usage/log`	Raw event log

The Settings → AI Gateway page renders these. pfai gateway usage is the CLI surface.

Adding custom pricing in OSS

The internal/llmgateway/pricing package exposes a hook (pricing.go):

import "github.com/proxifai/proxifai-oss/internal/llmgateway/pricing"

func init() {
    pricing.CostCalculator = func(model string, prompt, completion int) float64 {
        rates := map[string]struct{ in, out float64 }{
            "claude-sonnet-4.6": {3.00 / 1e6, 15.00 / 1e6}, // $/token
            "gpt-4o":            {2.50 / 1e6, 10.00 / 1e6},
            // …
        }
        r, ok := rates[model]
        if !ok {
            return 0
        }
        return float64(prompt)*r.in + float64(completion)*r.out
    }
}

Once registered, usage.Event.estimatedCostUsd populates with real USD per call, and the /gateway/usage* endpoints aggregate it. There’s no rate-table UI in OSS — the calculator is a Go function, not config.

Enterprise mode

When account-api is reachable (set ACCOUNT_API_URL or wire a credits.Client), the gateway adds a pre-request credit check middleware that calls CheckCredit(orgID) before dispatching. If the org has zero or negative balance and is_enforced is true, the gateway returns 402 Payment Required with {"error":{"message":"insufficient credits"}}.

Per-org credit state from the account-api (credits/client.go):

Field	Type	Meaning
`balance_usd`	float	Current balance
`max_drawdown_usd`	float	Soft floor — alerts fire as the balance approaches this
`is_enforced`	bool	If false, the org isn’t blocked even at zero balance
`workflow_cost_per_min_usd`	float	Per-minute compute cost for workflow execution containers
`agent_cost_per_min_usd`	float	Per-minute compute cost for agent execution containers

Credit checks are cached for 5 seconds and protected by a circuit breaker (5 failures → 30-second open). The breaker fails open — if account-api is unreachable, requests proceed rather than being blocked. This is intentional: a credit-service outage shouldn’t take down the platform.

What deducts credits

In Enterprise:

Source	Cost driver
LLM gateway calls	Token cost via `pricing.CalculateCost` (registered by EE)
Workflow execution time	`workflow_cost_per_min_usd` × container minutes
Agent execution time	`agent_cost_per_min_usd` × container minutes
Storage (artifacts, container volumes)	Depends on plan

Compute cost is recorded by the instances reaper (internal/reaper) when containers terminate; it computes elapsed time and emits a usage event the credit service deducts against.

Usage tracking dimensions

Both OSS and Enterprise report:

Dimension	Where
Per model	`/gateway/usage/by-model`
Per workflow / chat session / agent run	`/gateway/usage/by-workflow` (uses the `pfai_<execID>` token’s claim)
Per team / per project	Filterable on every endpoint
Per user	Available via `executionId → workflow → owner` joins
Daily / weekly / monthly	`/gateway/usage/timeline?period=…`

Token counts split into:

promptTokens — input (prompt + system message + conversation history)
completionTokens — generated output
cacheHit — bool; provider cache hits when supported (Anthropic prompt caching, OpenAI cached completions)

Budget alerts and rate limits

Two related but distinct mechanisms:

	Budget alerts	Rate limits
What	Notify when usage crosses a threshold	Block requests above a per-period quota
Hard stop?	No — informational only	Yes — returns 429
Configured at	`/api/v1/gateway/budgets`	`/api/v1/gateway/rate-limits`
Granularity	Org / team / project	Per API key + per provider

For hard limits, layer rate limits on top of budget alerts. Budget alerts can ride either Slack (via integration) or the inbox.

REST endpoints

Method · Path	Purpose
`GET /api/v1/gateway/usage`	Period totals
`GET /api/v1/gateway/usage/by-model`	Per-model breakdown
`GET /api/v1/gateway/usage/by-workflow`	Attribution
`GET /api/v1/gateway/usage/timeline`	Time series
`GET /api/v1/gateway/usage/log`	Raw events
`GET · POST · DELETE /api/v1/gateway/rate-limits[/{id}]`	Rate limit CRUD
`GET · POST · DELETE /api/v1/gateway/budgets[/{id}]`	Budget alert CRUD

pfai gateway usage, pfai gateway rate-limits, pfai gateway budgets wrap these.