Detection benchmark
One gateway, every attack class, zero false positives
A head-to-head against the open-source guardrails we can actually run, on public and red-team corpora, at a fixed 2% false-positive budget. Single-purpose tools each cover one attack class; AxioRank covers them all at zero false positives, and we publish the harness and every caveat.
The bottom line
- Breadth. AxioRank caught the attacks across all 4 overt classes we tested (prompt injection, secret exfiltration, PII exfiltration, destructive operations), including a base64 obfuscated key it had to decode. Each competitor covers a single class.
- Zero false positives. AxioRank flagged 0 of 302 legitimate flows. The regex baseline only reaches its catch rate by blocking roughly one in four legitimate flows, which is over the 2% budget.
- Where we are honest. On subtle indirect injection (a benign-sounding instruction hidden in tool output), every offline content scanner is weak, ours included. That threat is caught by the gateway's information-flow control, measured in the enforcement benchmark, not by content scanning.
How to read the numbers
- Caught at 2% false positives
- The share of attacks a tool blocks while keeping benign false positives at or under 2%. A tool that catches attacks only by also blocking legitimate work is marked over budget, because a guardrail that blocks everything is an outage, not a defense. A hold counts as a catch: it stops autonomous execution.
- Each tool, its own turf
- LLM Guard is scored only on injection, Presidio only on PII: each tool is measured on the classes it is built for, and that scope is stated per panel. AxioRank is scored on every class.
- 95% CI
- The Wilson confidence interval around the catch rate, so a small per-class sample is never dressed up as a precise number.
Every overt attack class, caught at zero false positives
AxioRank sits on top in indigo. A bar drawn in red reaches its height only by exceeding the 2% false-positive budget, so a tall red bar is not a usable result.
Overt prompt injection
Overt instruction-override and SSRF payloads passed into a tool call.
| Tool | Caught | 95% CI | False positives |
|---|---|---|---|
| AxioRank | 100% | [34%, 100%] | 0.0% |
| block-all (control) | 100% | [34%, 100%] | 100.0%over 2% budget |
| Regex baseline | 50% | [9%, 91%] | 23.8%over 2% budget |
| Protect AI LLM Guard | 0% | [0%, 66%] | 0.0% |
| allow-all (control) | 0% | [0%, 66%] | 0.0% |
n = 2 attack, 302 benign flows. AxioRank caught 100% at 0.0% false positives. Source: redteam-corpus.
Secret exfiltration
Live credentials in tool arguments, including an AWS key hidden in base64 that a scanner has to decode to see.
| Tool | Caught | 95% CI | False positives |
|---|---|---|---|
| AxioRank | 100% | [44%, 100%] | 0.0% |
| block-all (control) | 100% | [44%, 100%] | 100.0%over 2% budget |
| Regex baseline | 67% | [21%, 94%] | 23.8%over 2% budget |
| allow-all (control) | 0% | [0%, 56%] | 0.0% |
n = 3 attack, 302 benign flows. AxioRank caught 100% at 0.0% false positives. Source: redteam-corpus.
PII exfiltration
Bulk personal data (names, emails, Social Security numbers) sent to an outside host.
| Tool | Caught | 95% CI | False positives |
|---|---|---|---|
| AxioRank | 100% | [34%, 100%] | 0.0% |
| Regex baseline | 100% | [34%, 100%] | 23.8%over 2% budget |
| block-all (control) | 100% | [34%, 100%] | 100.0%over 2% budget |
| Microsoft Presidio | 0% | [0%, 66%] | 0.0% |
| allow-all (control) | 0% | [0%, 66%] | 0.0% |
n = 2 attack, 302 benign flows. AxioRank caught 100% at 0.0% false positives. Source: redteam-corpus.
Destructive operations
Schema-destroying and filesystem-destroying operations.
| Tool | Caught | 95% CI | False positives |
|---|---|---|---|
| block-all (control) | 100% | [44%, 100%] | 100.0%over 2% budget |
| AxioRank | 67% | [21%, 94%] | 0.0% |
| Regex baseline | 67% | [21%, 94%] | 23.8%over 2% budget |
| allow-all (control) | 0% | [0%, 56%] | 0.0% |
n = 3 attack, 302 benign flows. AxioRank caught 67% at 0.0% false positives. Source: redteam-corpus.
Where content scanning ends and the gateway begins
A benign-sounding instruction smuggled into the content a tool returns to the agent.On this class, every offline content scanner is weak, AxioRank's detectors included. This is not a scanning problem to tune away: the attack is a legitimate-looking request, so there is no payload to match. AxioRank catches it at the gateway by tracking that the agent read untrusted content before it acted, which the enforcement benchmark measures. We show it here rather than quietly dropping the panel.
| Tool | Caught | 95% CI | False positives |
|---|---|---|---|
| block-all (control) | 100% | [99%, 100%] | 100.0%over 2% budget |
| Regex baseline | 62% | [56%, 67%] | 23.8%over 2% budget |
| AxioRank | 5% | [3%, 8%] | 0.0% |
| Protect AI LLM Guard | 0% | [0%, 1%] | 0.0% |
| allow-all (control) | 0% | [0%, 1%] | 0.0% |
n = 300 attack, 302 benign flows. Source: injecagent.
What we ran, and what we did not
We chart only tools we ran end to end on the same corpora. Everything else is named with the reason it could not be run, never estimated.
- Rebuff: needs a paid model API and an external vector store; cannot run offline.
- NVIDIA NeMo Guardrails: its rails invoke an LLM backend; cannot run offline.
- Vigil: heavy, uncertain local model and YARA setup; not run in this build.
The pre-registered bar
We fix these before the run so the result cannot be rationalized after the fact. We publish the headline only if all hold, and the same harness that produced the chart evaluates them:
- allow-all blocks nothing, so the harness is not inventing catches.
- On every overt class, AxioRank stays within the 2% false-positive budget and catches something.
- On every overt class, no in-budget competitor beats AxioRank. A tool that leads only by exceeding the budget does not count.
The run cleared every condition.
Run provenance
AxioRank is scored through the shipped engine (@axiorank/detectors/node (salted fingerprint + recursive base64/hex/gzip decode)), so the benchmark measures exactly what the gateway default does. Corpora: InjecAgent (600 cases, no LICENSE file in upstream repo; derived fixture not committed), AxioRank red-team corpus (13 cases, MIT). Detector models are pinned: Protect AI LLM Guard (protectai/deberta-v3-base-prompt-injection-v2), Microsoft Presidio (spacy/en_core_web_lg). Gateway commit f857bba.
Omitted, not hidden: Data egress (too few attack cases (n=1 < 2); this class is measured in the enforcement benchmark).
Read the full methodology for the case schema, the adapter contract, and how to reproduce every number, or compare with the gateway enforcement benchmark.