AxioRankDocs

Agent budgets

Cap what an agent can spend and do, per hour, day, or month, with the same policy engine that governs every call.

Autonomous agents fail in two expensive ways: they loop (hundreds of calls in minutes) and they spend (model and API costs nobody approved). A budget policy caps both with one rule. The cap is enforced inline by the policy engine, so an over-budget agent is denied (or held for approval) at the gateway, not discovered on an invoice.

How it works

A policy can carry a budget context constraint with three fields:

FieldValuesMeaning
metriccalls, cost_usdWhat accrues: governed tool calls, or SDK-reported spend in USD
windowhour, day, monthA fixed UTC clock window (not a rolling window)
gtenumberThe rule fires once the agent's window total reaches this cap

The rule's action decides what happens over budget: deny blocks outright, require_approval holds each further call for a human. The policy's tool pattern scopes which calls get denied once the agent is over budget; the totals themselves count every governed call by that agent, denied calls included, so a blocked runaway keeps consuming its window instead of resetting it.

axiorank.config.json (config-as-code)
{
  "policies": [
    {
      "name": "Cap research-agent velocity",
      "toolPattern": "*",
      "action": "deny",
      "context": {
        "agent": { "labels": { "anyOf": ["team:research"] } },
        "budget": { "metric": "calls", "window": "hour", "gte": 500 }
      }
    },
    {
      "name": "Hold once spend passes $25/day",
      "toolPattern": "*",
      "action": "require_approval",
      "context": {
        "budget": { "metric": "cost_usd", "window": "day", "gte": 25 }
      }
    }
  ]
}

Budget predicates are fail-open, like the ML and novelty predicates: if the metering counters are unreadable, the predicate stays unmet and the call proceeds. A metering outage can never take your agents down.

Reporting spend

Call counting is automatic the moment a budget policy exists. Spend (cost_usd) is reported by your SDK, since only your code knows what a turn cost:

The LiteLLM guardrail computes each turn's model cost natively and attributes it to the first governed tool call of the turn. No configuration:

from axiorank import AsyncAxioRank
from axiorank.integrations.litellm import axiorank_guardrail

proxy_handler_instance = axiorank_guardrail(AsyncAxioRank(api_key="axr_live_..."))
const { decision } = await axio.toolCall({
  tool: "web.search",
  arguments: { q: "..." },
  costUsd: turnCost, // e.g. from your model provider's usage callback
});
result = axio.tool_call("web.search", {"q": "..."}, cost_usd=turn_cost)

Observing budgets

  • Each agent's detail page shows its calls this hour, today, and this month, plus reported spend for the month.
  • The first budget denial in an hour raises a triageable budget alert (medium severity) and notifies your channels; further denials in the same window are deduplicated so a runaway agent cannot page you hundreds of times.
  • Every denial is an audit log row like any other decision, exportable and receipt-backed.

Windows are fixed UTC clock windows: an hour budget resets at the top of each UTC hour, a month budget on the first of the month. A cap of 500 per hour therefore allows at most 500 calls between 14:00 and 15:00 UTC, not 500 in any rolling 60 minutes.

Velocity caps vs. plan rate limits

Plan rate limits (requests per second) protect the platform and are billing tier scoped. Budget policies are your governance: per agent, per window, with policy actions (deny, hold) and audit trails. Use both.

On this page