GitHub
Concept

Billing & Usage

How LLM usage, credit enforcement, and cost reporting work in OSS vs Enterprise — and what's a no-op without account-api.

ProxifAI’s billing surface differs sharply between the OSS build and the Enterprise/SaaS distribution. The OSS build records token usage faithfully but doesn’t enforce credit budgets and reports $0 cost per call (no pricing tables registered). Enterprise wires up account-api for credit enforcement and registers a real cost calculator. This page explains both modes and how to add custom pricing in OSS.

What’s recorded vs what’s enforced

LLM call                            ┌── usage.Event published to NATS LLM_USAGE stream
  │                                 │       (always, OSS or EE)
  ▼                                 │
gateway middleware ──► token count ─┤
                          │         └── pricing.CalculateCost
                          │                ├─ OSS: returns 0
                          │                └─ EE:  current per-model rates

                          └── credit check (optional, before request)
                                ├─ OSS without account-api: no-op (allows)
                                └─ EE/SaaS: ErrInsufficientCredits → 402

Net effect: in OSS, you get accurate token counts and a usage timeline, but the estimatedCostUsd field is always 0 and nothing prevents a runaway workflow from burning provider quota. In Enterprise/SaaS, the same surface enforces a USD-denominated credit balance against the LLM gateway middleware.

OSS mode

In a vanilla OSS deployment (no account-api configured), every gateway call:

  1. Authenticates the token (PAT or pfai_<execID>)
  2. Rate-limits at 120 req/min, burst 20 — see AI Gateway
  3. Routes to a provider, captures token counts on success
  4. Publishes a usage.Event to the LLM_USAGE NATS JetStream stream
  5. The internal/llmusage consumer writes one row per event into the gateway_usage table

Reports are queryable:

Method · PathReturns
GET /api/v1/gateway/usagePeriod totals (input + output tokens, request count)
GET /api/v1/gateway/usage/by-modelPer-model breakdown
GET /api/v1/gateway/usage/by-workflowAttribution to chat sessions, agent runs, workflow executions
GET /api/v1/gateway/usage/timelineTime-series for charts
GET /api/v1/gateway/usage/logRaw event log

The Settings → AI Gateway page renders these. pfai gateway usage is the CLI surface.

Adding custom pricing in OSS

The internal/llmgateway/pricing package exposes a hook (pricing.go):

import "github.com/proxifai/proxifai-oss/internal/llmgateway/pricing"

func init() {
    pricing.CostCalculator = func(model string, prompt, completion int) float64 {
        rates := map[string]struct{ in, out float64 }{
            "claude-sonnet-4.6": {3.00 / 1e6, 15.00 / 1e6}, // $/token
            "gpt-4o":            {2.50 / 1e6, 10.00 / 1e6},
            // …
        }
        r, ok := rates[model]
        if !ok {
            return 0
        }
        return float64(prompt)*r.in + float64(completion)*r.out
    }
}

Once registered, usage.Event.estimatedCostUsd populates with real USD per call, and the /gateway/usage* endpoints aggregate it. There’s no rate-table UI in OSS — the calculator is a Go function, not config.

Enterprise mode

When account-api is reachable (set ACCOUNT_API_URL or wire a credits.Client), the gateway adds a pre-request credit check middleware that calls CheckCredit(orgID) before dispatching. If the org has zero or negative balance and is_enforced is true, the gateway returns 402 Payment Required with {"error":{"message":"insufficient credits"}}.

Per-org credit state from the account-api (credits/client.go):

FieldTypeMeaning
balance_usdfloatCurrent balance
max_drawdown_usdfloatSoft floor — alerts fire as the balance approaches this
is_enforcedboolIf false, the org isn’t blocked even at zero balance
workflow_cost_per_min_usdfloatPer-minute compute cost for workflow execution containers
agent_cost_per_min_usdfloatPer-minute compute cost for agent execution containers

Credit checks are cached for 5 seconds and protected by a circuit breaker (5 failures → 30-second open). The breaker fails open — if account-api is unreachable, requests proceed rather than being blocked. This is intentional: a credit-service outage shouldn’t take down the platform.

What deducts credits

In Enterprise:

SourceCost driver
LLM gateway callsToken cost via pricing.CalculateCost (registered by EE)
Workflow execution timeworkflow_cost_per_min_usd × container minutes
Agent execution timeagent_cost_per_min_usd × container minutes
Storage (artifacts, container volumes)Depends on plan

Compute cost is recorded by the instances reaper (internal/reaper) when containers terminate; it computes elapsed time and emits a usage event the credit service deducts against.

Usage tracking dimensions

Both OSS and Enterprise report:

DimensionWhere
Per model/gateway/usage/by-model
Per workflow / chat session / agent run/gateway/usage/by-workflow (uses the pfai_<execID> token’s claim)
Per team / per projectFilterable on every endpoint
Per userAvailable via executionId → workflow → owner joins
Daily / weekly / monthly/gateway/usage/timeline?period=…

Token counts split into:

  • promptTokens — input (prompt + system message + conversation history)
  • completionTokens — generated output
  • cacheHit — bool; provider cache hits when supported (Anthropic prompt caching, OpenAI cached completions)

Budget alerts and rate limits

Two related but distinct mechanisms:

Budget alertsRate limits
WhatNotify when usage crosses a thresholdBlock requests above a per-period quota
Hard stop?No — informational onlyYes — returns 429
Configured at/api/v1/gateway/budgets/api/v1/gateway/rate-limits
GranularityOrg / team / projectPer API key + per provider

For hard limits, layer rate limits on top of budget alerts. Budget alerts can ride either Slack (via integration) or the inbox.

REST endpoints

Method · PathPurpose
GET /api/v1/gateway/usagePeriod totals
GET /api/v1/gateway/usage/by-modelPer-model breakdown
GET /api/v1/gateway/usage/by-workflowAttribution
GET /api/v1/gateway/usage/timelineTime series
GET /api/v1/gateway/usage/logRaw events
GET · POST · DELETE /api/v1/gateway/rate-limits[/{id}]Rate limit CRUD
GET · POST · DELETE /api/v1/gateway/budgets[/{id}]Budget alert CRUD

pfai gateway usage, pfai gateway rate-limits, pfai gateway budgets wrap these.

See also