The seven-stage runtime pipeline that sits between your agents and everything that can go wrong.
Same prompt. Completely different threat. A classifier cannot tell the difference — not because the model is weak, but because the architecture is blind to context.
Sees tokens only. No permissions check. No origin tracking. No intent analysis.
Prompt injected silently through a RAG document. Agent has no payment permissions. Intent does not match declared scope.
We beat enterprise APIs on efficacy and open-source models on latency.
Benchmarks reflect Defender Stage 1 of 7, our custom classifier. Every other tool stops here. We are just getting started. Full pipeline results in private testing.
| Benchmark | Hlyn (Defender) | Lakera (Enterprise API) | ProtectAI (v2 Open Source) | Meta PG2 (Prompt Guard 2) | Azure (Cloud API) | AWS Bedrock (Cloud API) | Benchspan (Reference) |
|---|---|---|---|---|---|---|---|
| Detection Efficacy (Higher F1 / Lower FPR is better) | |||||||
| Qualifire (F1) Direct chat attacks | 0.8886 | 0.748 | 0.6549 | 0.686 | 0.454 | 0.715 | 0.728 |
| InjecAgent (F1) Indirect tool poisoning | 0.99¹ | 0.589 | 0.552 | 0.039 | 0.648 | 0.000 | 0.966 |
| NotInject (FPR) False alarms on safe text | 7.1% | 16.2% | 26.5% | 5.0% | 4.4% | 3.5% | 7.7% |
Prompt Injection detection is available now via the classifier API. Full pipeline coverage for everything else ships next.
| Threat | Prevent vs Contain |
|---|---|
| Prompt Injection Attackers override instructions to hijack the agent's goal. | Prevent Drops hostile semantic intents before the LLM processes them. |
| Indirect Injection Hidden payloads in RAG docs trigger delayed hijacks. | Prevent Sanitizes untrusted context during retrieval to neutralize latent triggers. |
| Data Exfiltration Agent leaks PII, proprietary data, or secrets in its output. | Contain Deterministic egress filtering redacts sensitive patterns in-flight. |
| Tool Auth Agent hallucinates or is persuaded into making unauthorized tool calls. | Prevent Intent-to-tool validation blocks unauthorized API access at execution time. |
| State Contamination A compromised agent attempts lateral movement across the multi-agent system. | Contain Zero-trust boundaries between agents ensure the attack dies where it started and never reaches the orchestrator or peer agents. |
| Agent-to-Agent Propagation A trusted agent is weaponized to attack its orchestrator or sibling agents through poisoned outputs or tool responses. | Prevent Taint tracking across agent boundaries intercepts lateral movement before it reaches the next hop in the pipeline. |