AxioRankDocs

Detection intelligence

How a call becomes a verdict beyond the deterministic signals, with ML assessment, a semantic judge, taint provenance, and kill-chain correlation.

The content-inspection engine and your policies decide every call in-band, deterministically. Detection intelligence is the layer around that decision: a model's semantic verdict, value-level provenance, and multi-step correlation. It catches what a single-call regex pass cannot.

Two lanes

The gateway never waits on a model. Detection runs in two lanes:

WhatLaneAffects
Content inspection, risk score, redactionHot path, in-bandThis call's verdict
Policy evaluation, including the IFC sink checkHot path, in-bandThis call's verdict
ML assessment (the semantic judge)Post-hoc jobAlerts, response rules, future calls via mlThreatClass
Kill-chain correlationPost-hoc jobAlerts, response rules

The post-hoc lane never delays a decision. It enriches the record, raises alerts, and feeds the next decision.

ML assessment

After a tool call is evaluated, a background job can send it to AxioRank's model service for a semantic verdict: prompt injection, jailbreak, or exfiltration intent that pattern matching misses. The call is gated, in order:

  1. Global config: the model service must be configured for the deployment.
  2. Workspace opt-in: external model egress is off by default; a workspace setting turns it on.
  3. Plan entitlement: AI assessments are a Team and Enterprise feature.
  4. Worth-it gate: always assessed for a critical signal, an ambiguous output-injection, or heuristic risk at or above 40; otherwise a deterministic 5% sample of benign calls keeps a baseline of "normal".
  5. Monthly cap: assessments are metered against a per-plan monthly limit, checked before the spend.

Only the redacted payload (secrets already masked) ever leaves the platform.

Fail-open by design

If the model service is unreachable, the assessment is recorded as unavailable and nothing else happens. The deterministic decision already stood and was already returned; ML is enrichment, never a dependency.

A completed verdict carries a calibrated mlRisk (0 to 100), a recommendation (allow · review · block · escalate), and a threatClass: benign · prompt_injection · jailbreak · data_exfiltration · malware · social_engineering · policy_violation · unknown. When the recommendation is block or escalate, or mlRisk is 80 or above, an ml_threat alert is raised through your normal channels. Every persisted verdict fires the ml.assessed webhook and drives ml_* response-rule predicates.

A policy can also match on the verdict with the mlThreatClass predicate. Because the verdict is produced asynchronously, the predicate matches the agent's latest completed assessment, not the current call's. It is fail-open: with no verdict on record, the predicate is simply unmet, so a policy never denies on missing ML data.

The semantic judge

The judge's defining job is adjudicating ambiguity. When the deterministic detectors flag a possible injection in a tool output (a lone forged role marker or embedded tool directive can score below the risk floor), that is exactly the "the regex flagged it, but is it real?" case, so it is always sent for assessment regardless of risk.

Confirmed verdicts feed a self-improving loop. When the judge confirms an injection-family threat (prompt_injection, jailbreak, data_exfiltration) with confidence at or above 0.8, and the deterministic layer under-scored it, a second model call generalizes the finding into a reusable custom detector. The proposal is born disabled, marked ai_proposed, capped at 20 AI-proposed detectors per workspace, and metered like any assessment. A human reviews and arms it; a model never enables detection unattended.

Taint provenance

Information-flow control (IFC) tracks values, not just calls. When an untrusted tool returns a result, the payload's string leaves are fingerprinted at ingress: opaque salted hashes over several normalized variants of each leaf (raw, whitespace and case normalized, and base64/hex decoded) so trivial obfuscation does not break the match. The raw value is never stored, only fingerprints.

Untrusted sources are MCP servers not marked trusted (tag mcp_untrusted) plus tool-name classes: web_fetch, inbound_email, file_read, db_read.

When a later call in the trace is a sink (egress · destructive · state_change), its own argument leaves are fingerprinted and checked against the trace's accumulated untrusted set. An IFC policy rule chooses the propagation mode:

  • explicit: fires only when a tainted value provably reappeared in the sink arguments. Evasion-resistant, and it records which prior step minted the value, so the flow is a provable chain rather than an inference.
  • coarse: fires when any untrusted output was seen earlier in the trace. The high-recall backstop for transformations explicit matching cannot follow.

All IFC work is gated on the workspace having an enabled IFC policy; a workspace without one pays nothing. Results proxied through the MCP gateway are fingerprinted automatically. On the SDK path the platform only sees outputs your code reports, so call inspectResult or pass inspectResults: true to a framework adapter to bring tool outputs into the taint trace.

Kill-chain correlation

Single-call scoring misses the most dangerous behavior: a sequence whose steps each look fine alone. After every evaluated call, a post-hoc job loads the run's prior calls (by trace id, or by agent within a window when un-instrumented) and checks whether the just-landed call completes a dangerous ordered pattern over the most recent 20 steps:

PatternSequenceSeverity
exfiltrationA sensitive read (secret/PII signal or a read-shaped tool), then egresshigh; critical when a live secret was seen
recon_then_destroyThree or more reads/lists, then a destructive callhigh
injection_then_actionAn injection signal, then a state-changing or egress callhigh

When the IFC pass proved that a value read at an earlier step reached this call's egress arguments, the finding is marked valueConfirmed and is always critical: the contributing steps are the exact provenance chain, not a heuristic window.

A denied call never raises a chain alert (the attempt was already blocked), and dedup keeps a long chain to one alert per pattern within a 30-minute cooldown. Findings land as kill_chain alerts in the normal triage lifecycle. A critical exfiltration chain also emits the kill_chain.detected event, which armed response rules can act on (for example, quarantining the agent) and webhooks deliver to your own systems.

On this page