Observability
Observability that knows whether the call was safe.
Latency percentiles, reliability, and per-call cost for every governed agent call, on the very gateway that decides whether to allow it. Other tools show you latency and cost. AxioRank shows you latency and cost plus the decision, the risk, and what data the call touched.
Spend and latency on every plan. Live tail, SLA monitors, and OTLP export on Pro and Team.
What you see
The whole call, not just the metric.
The gateway already records every call with its decision, risk signals, and data lineage. Observability rolls that up, so each chart is decision-aware: read p95 latency and blocked rate off the same governed traffic.
Latency percentiles
p50, p95, and p99 over time and per model, computed in Postgres. Gateway decision time and upstream provider time, separated.
Per-call cost
Cost, tokens, and model on every call. Drill from a daily total down to the one call that cost the most.
Reliability
Throughput and blocked rate, deny plus hold, as a first-class reliability signal. Errors through a security lens.
Model performance
Compare latency, cost per call, token use, and block rate across every model and provider you route to.
Live tail
Watch governed calls arrive in real time, each with its decision, risk, latency, and cost. Pause to inspect.
Latency & cost SLAs
Alert when p95 latency or average cost per call crosses a line, through the same channels as a security alert.
Try it live
Watch a slow tail move p99 while p50 holds.
This runs Postgres percentile_cont, the exact function behind the charts, here in your browser. Inject a slow tail and see what an average would have hidden.
p50 (median)
265 ms
p95
682 ms
p99
790 ms
This is Postgres percentile_cont, the exact function behind the /observability charts, run here in your browser. Inject the tail and watch p99 jump while p50 barely moves. That gap is a slow model or a cold start your average would have hidden, and what a p99 SLA monitor catches.
Cost, per call
Know what a call costs before the invoice does.
AxioRank prices every call from the model and token counts using its catalog, the same math the gateway runs. Pick a model and see the cost of one call, and of a million.
Runs the real AxioRank price catalog in your browser. The gateway uses this exact math to stamp a cost on every call.
$0.0200
$0.005/1k in · $0.025/1k out
Don't get locked in
Your traces, in your stack, carrying the verdict.
Export every governed call to your own collector as an OpenTelemetry span. It carries the standard GenAI attributes and the AxioRank security attributes: the decision, the risk score, and the taint lineage. Edit the call and watch the span change.
Model
Decision
Risk score: 82
Upstream latency (ms)
name model.completiontrace aaaaaaaabbbb…
GenAI semantic conventions. Every gateway has these.
- gen_ai.system
- anthropic
- gen_ai.request.model
- claude-opus-4-8
- gen_ai.response.model
- claude-opus-4-8
- gen_ai.usage.input_tokens
- 1500
- gen_ai.usage.output_tokens
- 500
Security context. Only the gateway that made the decision can emit these.
- axiorank.decision
- deny
- axiorank.risk_score
- 82
- axiorank.tool
- model.completion
- axiorank.agent_id
- agent_checkout
- axiorank.workspace_id
- ws_demo
- axiorank.audit_log_id
- 11111111-2222-3333-4444-555555555555
- axiorank.cost_usd
- 0.02
- axiorank.gateway_latency_ms
- 6
- axiorank.upstream_latency_ms
- 1240
- axiorank.taint_tags
- untrusted_source
Built by the real buildOtlpTracesPayload. Route this to Datadog, Grafana, or Honeycomb and your traces carry the verdict, not just the latency.
The difference
Generic observability stops at latency and cost.
A normal LLM gateway can tell you a call was slow and what it cost. It cannot tell you the call was blocked for a leaked secret, that it touched untrusted data, or hand you a signed proof it was governed. AxioRank can, because the thing measuring the call is the thing that secured it.
Decision-aware
Every metric splits by allow, deny, and hold. See the p95 latency of blocked calls or the cost of held ones.
Taint lineage
A span shows whether untrusted data reached the call, correlated into the kill-chain view on the same trace.
Drop it in
One base URL for cost and latency. One field for the rest.
Point your OpenAI-compatible client at the gateway and spend and latency start flowing on any plan, computed from the response. On the SDK tool-call path, report the execution time you measured and it becomes the call's upstream latency.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://www.axiorank.com/api/proxy/v1",
defaultHeaders: { "X-AxioRank-Key": process.env.AXIORANK_KEY },
});
// Latency and cost are now captured on every call, no extra code.
await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "hello" }],
});Keep exploring
Continue across the control plane.
See every agent call, and whether it was safe.
Spend and latency are free to start. Turn on the live tail, SLA monitors, and OpenTelemetry export when you are ready.