Detection benchmark

One gateway, every attack class, zero false positives

A head-to-head against the open-source guardrails we can actually run, on public and red-team corpora, at a fixed 2% false-positive budget. Single-purpose tools each cover one attack class; AxioRank covers them all at zero false positives, and we publish the harness and every caveat.

The bottom line

  • Breadth. AxioRank caught the attacks across all 4 overt classes we tested (prompt injection, secret exfiltration, PII exfiltration, destructive operations), including a base64 obfuscated key it had to decode. Each competitor covers a single class.
  • Zero false positives. AxioRank flagged 0 of 302 legitimate flows. The regex baseline only reaches its catch rate by blocking roughly one in four legitimate flows, which is over the 2% budget.
  • Where we are honest. On subtle indirect injection (a benign-sounding instruction hidden in tool output), every offline content scanner is weak, ours included. That threat is caught by the gateway's information-flow control, measured in the enforcement benchmark, not by content scanning.

How to read the numbers

Caught at 2% false positives
The share of attacks a tool blocks while keeping benign false positives at or under 2%. A tool that catches attacks only by also blocking legitimate work is marked over budget, because a guardrail that blocks everything is an outage, not a defense. A hold counts as a catch: it stops autonomous execution.
Each tool, its own turf
LLM Guard is scored only on injection, Presidio only on PII: each tool is measured on the classes it is built for, and that scope is stated per panel. AxioRank is scored on every class.
95% CI
The Wilson confidence interval around the catch rate, so a small per-class sample is never dressed up as a precise number.

Every overt attack class, caught at zero false positives

AxioRank sits on top in indigo. A bar drawn in red reaches its height only by exceeding the 2% false-positive budget, so a tall red bar is not a usable result.

Overt prompt injection

Overt instruction-override and SSRF payloads passed into a tool call.

ToolCaught95% CIFalse positives
AxioRank100%[34%, 100%]0.0%
block-all (control)100%[34%, 100%]100.0%over 2% budget
Regex baseline50%[9%, 91%]23.8%over 2% budget
Protect AI LLM Guard0%[0%, 66%]0.0%
allow-all (control)0%[0%, 66%]0.0%

n = 2 attack, 302 benign flows. AxioRank caught 100% at 0.0% false positives. Source: redteam-corpus.

Secret exfiltration

Live credentials in tool arguments, including an AWS key hidden in base64 that a scanner has to decode to see.

ToolCaught95% CIFalse positives
AxioRank100%[44%, 100%]0.0%
block-all (control)100%[44%, 100%]100.0%over 2% budget
Regex baseline67%[21%, 94%]23.8%over 2% budget
allow-all (control)0%[0%, 56%]0.0%

n = 3 attack, 302 benign flows. AxioRank caught 100% at 0.0% false positives. Source: redteam-corpus.

PII exfiltration

Bulk personal data (names, emails, Social Security numbers) sent to an outside host.

ToolCaught95% CIFalse positives
AxioRank100%[34%, 100%]0.0%
Regex baseline100%[34%, 100%]23.8%over 2% budget
block-all (control)100%[34%, 100%]100.0%over 2% budget
Microsoft Presidio0%[0%, 66%]0.0%
allow-all (control)0%[0%, 66%]0.0%

n = 2 attack, 302 benign flows. AxioRank caught 100% at 0.0% false positives. Source: redteam-corpus.

Destructive operations

Schema-destroying and filesystem-destroying operations.

ToolCaught95% CIFalse positives
block-all (control)100%[44%, 100%]100.0%over 2% budget
AxioRank67%[21%, 94%]0.0%
Regex baseline67%[21%, 94%]23.8%over 2% budget
allow-all (control)0%[0%, 56%]0.0%

n = 3 attack, 302 benign flows. AxioRank caught 67% at 0.0% false positives. Source: redteam-corpus.

Where content scanning ends and the gateway begins

A benign-sounding instruction smuggled into the content a tool returns to the agent.On this class, every offline content scanner is weak, AxioRank's detectors included. This is not a scanning problem to tune away: the attack is a legitimate-looking request, so there is no payload to match. AxioRank catches it at the gateway by tracking that the agent read untrusted content before it acted, which the enforcement benchmark measures. We show it here rather than quietly dropping the panel.

ToolCaught95% CIFalse positives
block-all (control)100%[99%, 100%]100.0%over 2% budget
Regex baseline62%[56%, 67%]23.8%over 2% budget
AxioRank5%[3%, 8%]0.0%
Protect AI LLM Guard0%[0%, 1%]0.0%
allow-all (control)0%[0%, 1%]0.0%

n = 300 attack, 302 benign flows. Source: injecagent.

What we ran, and what we did not

We chart only tools we ran end to end on the same corpora. Everything else is named with the reason it could not be run, never estimated.

AxioRankofflineProtect AI LLM Guardmodel downloadMicrosoft Presidiomodel downloadRegex baselineofflineallow-all (control)offlineblock-all (control)offline
  • Rebuff: needs a paid model API and an external vector store; cannot run offline.
  • NVIDIA NeMo Guardrails: its rails invoke an LLM backend; cannot run offline.
  • Vigil: heavy, uncertain local model and YARA setup; not run in this build.

The pre-registered bar

We fix these before the run so the result cannot be rationalized after the fact. We publish the headline only if all hold, and the same harness that produced the chart evaluates them:

  • allow-all blocks nothing, so the harness is not inventing catches.
  • On every overt class, AxioRank stays within the 2% false-positive budget and catches something.
  • On every overt class, no in-budget competitor beats AxioRank. A tool that leads only by exceeding the budget does not count.

The run cleared every condition.

Run provenance

AxioRank is scored through the shipped engine (@axiorank/detectors/node (salted fingerprint + recursive base64/hex/gzip decode)), so the benchmark measures exactly what the gateway default does. Corpora: InjecAgent (600 cases, no LICENSE file in upstream repo; derived fixture not committed), AxioRank red-team corpus (13 cases, MIT). Detector models are pinned: Protect AI LLM Guard (protectai/deberta-v3-base-prompt-injection-v2), Microsoft Presidio (spacy/en_core_web_lg). Gateway commit f857bba.

Omitted, not hidden: Data egress (too few attack cases (n=1 < 2); this class is measured in the enforcement benchmark).

Read the full methodology for the case schema, the adapter contract, and how to reproduce every number, or compare with the gateway enforcement benchmark.