Observability

Observability that knows whether the call was safe.

Latency percentiles, reliability, and per-call cost for every governed agent call, on the very gateway that decides whether to allow it. Other tools show you latency and cost. AxioRank shows you latency and cost plus the decision, the risk, and what data the call touched.

Spend and latency on every plan. Live tail, SLA monitors, and OTLP export on Pro and Team.

live tailstreaming
17:00:04model.completion1.2 s$0.021deny
17:00:03gmail.send340 ms$0.004allow
17:00:01model.completion880 ms$0.012allow
Every row is a governed decision, signed and verifiable offline.
p50 / p95 / p99
latency percentiles, in Postgres
Per call
cost, tokens, and model attributed
OTLP
export to Datadog, Grafana, Honeycomb

What you see

The whole call, not just the metric.

The gateway already records every call with its decision, risk signals, and data lineage. Observability rolls that up, so each chart is decision-aware: read p95 latency and blocked rate off the same governed traffic.

Latency percentiles

p50, p95, and p99 over time and per model, computed in Postgres. Gateway decision time and upstream provider time, separated.

Per-call cost

Cost, tokens, and model on every call. Drill from a daily total down to the one call that cost the most.

Reliability

Throughput and blocked rate, deny plus hold, as a first-class reliability signal. Errors through a security lens.

Model performance

Compare latency, cost per call, token use, and block rate across every model and provider you route to.

Live tail

Watch governed calls arrive in real time, each with its decision, risk, latency, and cost. Pause to inspect.

Latency & cost SLAs

Alert when p95 latency or average cost per call crosses a line, through the same channels as a security alert.

Try it live

Watch a slow tail move p99 while p50 holds.

This runs Postgres percentile_cont, the exact function behind the charts, here in your browser. Inject a slow tail and see what an average would have hidden.

Latency distribution (60 calls)

p50 (median)

265 ms

p95

682 ms

p99

790 ms

0 ms890 ms

This is Postgres percentile_cont, the exact function behind the /observability charts, run here in your browser. Inject the tail and watch p99 jump while p50 barely moves. That gap is a slow model or a cold start your average would have hidden, and what a p99 SLA monitor catches.

Cost, per call

Know what a call costs before the invoice does.

AxioRank prices every call from the model and token counts using its catalog, the same math the gateway runs. Pick a model and see the cost of one call, and of a million.

One governed call

Runs the real AxioRank price catalog in your browser. The gateway uses this exact math to stamp a cost on every call.

Cost this callanthropic

$0.0200

$0.005/1k in · $0.025/1k out

Per 1,000 calls$20.00
Per 1,000,000 calls$20000.00

Don't get locked in

Your traces, in your stack, carrying the verdict.

Export every governed call to your own collector as an OpenTelemetry span. It carries the standard GenAI attributes and the AxioRank security attributes: the decision, the risk score, and the taint lineage. Edit the call and watch the span change.

The governed call

Model

Decision

Risk score: 82

Upstream latency (ms)

OTLP spanERROR · secret detected in model output

name model.completiontrace aaaaaaaabbbb

gen_ai.*

GenAI semantic conventions. Every gateway has these.

gen_ai.system
anthropic
gen_ai.request.model
claude-opus-4-8
gen_ai.response.model
claude-opus-4-8
gen_ai.usage.input_tokens
1500
gen_ai.usage.output_tokens
500
axiorank.*

Security context. Only the gateway that made the decision can emit these.

axiorank.decision
deny
axiorank.risk_score
82
axiorank.tool
model.completion
axiorank.agent_id
agent_checkout
axiorank.workspace_id
ws_demo
axiorank.audit_log_id
11111111-2222-3333-4444-555555555555
axiorank.cost_usd
0.02
axiorank.gateway_latency_ms
6
axiorank.upstream_latency_ms
1240
axiorank.taint_tags
untrusted_source

Built by the real buildOtlpTracesPayload. Route this to Datadog, Grafana, or Honeycomb and your traces carry the verdict, not just the latency.

The difference

Generic observability stops at latency and cost.

A normal LLM gateway can tell you a call was slow and what it cost. It cannot tell you the call was blocked for a leaked secret, that it touched untrusted data, or hand you a signed proof it was governed. AxioRank can, because the thing measuring the call is the thing that secured it.

Decision-aware

Every metric splits by allow, deny, and hold. See the p95 latency of blocked calls or the cost of held ones.

Taint lineage

A span shows whether untrusted data reached the call, correlated into the kill-chain view on the same trace.

Drop it in

One base URL for cost and latency. One field for the rest.

Point your OpenAI-compatible client at the gateway and spend and latency start flowing on any plan, computed from the response. On the SDK tool-call path, report the execution time you measured and it becomes the call's upstream latency.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://www.axiorank.com/api/proxy/v1",
  defaultHeaders: { "X-AxioRank-Key": process.env.AXIORANK_KEY },
});

// Latency and cost are now captured on every call, no extra code.
await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "hello" }],
});

See every agent call, and whether it was safe.

Spend and latency are free to start. Turn on the live tail, SLA monitors, and OpenTelemetry export when you are ready.