AI Agent Security · Behavioral Correctness

When agents fail, every check passes.

AI Defendo tracks every agent across every session, evaluating identity, intent, behavior, memory, context, and posture on every turn — at every tool call, memory operation, and data access. One verdict per turn.

Authentication. Authorization. Guardrails. Every check passed. The agent didn't. Behavioral Correctness catches what they miss.
No code changes Six dimensions Sub-200ms verdicts
Live workload telemetry
monitoring 24
sess-9d1
0.12 aligned
sess-4b8
0.54 drift
sess-7r2
0.89 blocked
data inflight PII 510 secrets 4 code 178 financial 95
passed 1,402 drift 18 blocked 3
last sync 2s ago

AI Defendo discovers every agent, observes every action, and evaluates behavioral correctness before execution.

Agents fail in ways prompts don't predict.

Indirect injection. Goal drift. Memory poisoning. Compacted context. Cross-agent escalation.

Every one looks like legitimate behavior until you watch the full agent across the full session.

The Casebook

Six incidents.
Every action looked legitimate.

Named vendors with disclosed CVEs and real-world impact. In every case, the agent acted within its permissions. What didn't exist was a check on the agent's conduct.

i.

EchoLeak

CRIT
Microsoft 365 Copilot · CVE-2025-32711
input output exfiltration

Zero-click email triggered Copilot to embed an exfiltration URL in its response. ~$200M impact across 160+ orgs.

indirect injectioncontextidentity
NVD →
ii.

Replit Agent

CRIT
Production database · public report
reasoning tools compliance

User said "code freeze." Agent dropped tables anyway. 1,206 executives + 1,196 companies deleted. 4,000 fake users fabricated.

trajectory driftbehaviorintent
SaaStr →
iii.

SpAIware

CRIT
ChatGPT (OpenAI) · Embrace The Red
memory output exfiltration

Cross-session attack. Memory poisoned in one chat — every chat after silently exfiltrated user data through legitimate APIs.

memory poisoningmemorycontext
Disclosure →
iv.

ForcedLeak

CVSS 9.4
Salesforce Agentforce
input output exfiltration

Web-to-Lead form hijacked Agentforce into exfiltrating CRM records. An expired domain still in the CSP allowed the egress.

indirect injectioncontextposture
Disclosure →
v.

Slack AI exfiltration

HIGH
Slack AI · PromptArmor disclosure
input output exposure

Public-channel injection made Slack AI surface private-channel content to a low-trust user. Slack's response: "intended behavior."

indirect injectionidentitycontext
Disclosure →
vi.

Now Assist

CRIT
ServiceNow · AppOmni research
reasoning tools exfiltration

Cross-agent escalation. Low-privilege agent tricked a higher-privilege one into exporting case files externally. ServiceNow: "works as intended."

behavioridentityintent
Disclosure →
In every incident: the agent had permission. The behavior was wrong. The check that catches that gap is Behavioral Correctness — and AI Defendo runs it on every turn.
Prompt filters read messages. Guardrails score outputs. Runtime monitors watch infrastructure.

None of them verify whether the agent's behavior was correct.

Prompt Security caught the message. Runtime Security caught the workload. Behavioral Correctness catches the agent — and includes the rest.
The six questions

Why six dimensions. Why these six.

Every agent action raises six questions. Miss any one and you can't say what really happened.

i.
Identity

Who acted?

The principal behind this turn — and the effective scope of their grant.

Now Assist — one agent escalated under another's grant
ii.
Intent

What were they commissioned to do?

The task the user or system actually asked the agent to perform.

Replit Agent — "code freeze" directive ignored
iii.
Behavior

What did they actually do?

The action itself — tool called, entity touched, and where this turn sits in the sequence.

Replit Agent — DROP TABLE executives + companies, then INSERT 4,000 fake users
iv.
Memory

What had they learned before this turn?

The agent's accumulated state from prior turns and prior sessions.

SpAIware — poisoned memory persisted cross-session
v.
Context

What inputs reached the agent, and from where?

The data the agent is reasoning over this turn, and whether it can be trusted.

EchoLeak — poisoned email reached the model context
vi.
Posture

Was the environment trusted?

The configuration around the workload — permissions, allowlists, baselines.

ForcedLeak — expired domain still on the CSP allowlist

AI Defendo answers all six on every turn.

Threats hit the agent lifecycle — input, reasoning, memory, tools, output. They reshape the data — exposure, exfiltration, secret leakage, compliance. AI Defendo maps both — and that's Behavioral Correctness.

Case · Replit Agent

Production database · July 2025

inferred task: investigate data anomaly · active directive: code freeze in effect

Telemetry audit stream — Replit incident replay
target: production data worker node
What your stack saw
What AI Defendo saw
Behavioral correctness score trigger threshold: 0.75
0.00
six-dimension evaluation turn — · awaiting
Identity pass principal verified · session continuous
Intent scope: investigate anomaly
Behavior watching tool sequence…
Memory pass no anomalous mutations · no cross-principal reads
Context pass no injection · no jailbreak signal
Posture pass environment trusted · registry baseline matches
VERDICT BLOCK 2 of 6 dimensions failed
ACTION inline kill · turn iv halted · alerted #sec-ai
What actually happened: 1,206 executives and 1,196 companies deleted. 4,000 fake users inserted. Two failing dimensions would have stopped it. real incident · jason lemkin · saastr
The Architecture

In four acts.

The Agent Awareness Engine sits at the center — the mechanism for Behavioral Correctness. Continuous detection, multi-mode sensors, and runtime actuators wrap completely around it.

Fig. The four acts at a glance.
AI Defendo platform architecture The AI Surface at top feeds two streams — Discovery (polled) on the left, Observation (live) on the right — into the Agent Awareness Engine at the center, which evaluates six dimensions (Identity, Intent, Behavior, Memory, Context, Posture) and emits one verdict per turn to three runtime actuators below: Identity Gateway, AI Interceptor, and Data Flow Control. — THE AI SURFACE — Agents LLMs MCP Skills Memory Identities DISCOVERY · POLLED OBSERVATION · LIVE Agent Awareness Engine Identity Intent Behavior Memory Context Posture VERDICT Identity Gateway JIT SCOPED GRANTS AI Interceptor BLOCK · COACH · REWRITE Data Flow Control DESTINATION-AWARE
Act I · Discover
continuous · polled

Find every agent. Every MCP server. Every place AI touches your data.

Continuous inventory across cloud, endpoint, and browser. Nothing autonomous stays invisible to the platform.

instruments
AI Discovery & Posture HubCloud · Identity · Logs
Endpoint ScannermacOS · Linux · Windows
Browser extensionChrome · Firefox
AI App RegistryCentral inventory schema
assets
Act II · Observe
continuous · live

Watch every turn — input, reasoning, tool call, memory, output.

Kernel sensors and inline interceptors capture the full bidirectional agent trajectory — request and response, tool call and result, MCP and skill I/O — including what your existing stack can't see.

instruments
eBPF sensorLinux 5.5+ · K8s · VM
AI InterceptorInline · pre-execution checkpoint
Browser extensionChrome · Firefox inline logs
OTel ingestionBedrock · Foundry · S3
Live audit pushWebhook · SSE pipeline
telemetry
Act III · Decide

The Agent Awareness Engine

The mechanism for Behavioral Correctness

Six dimensions joined, every turn. One verdict per action, sub-200ms, cryptographically signed.

IdentityCryptographic principal paths
IntentContextual policy & directives
BehaviorDDL anomalies vs task goals
MemoryMutation & cross-principal reads
ContextPrompt injection & jailbreaks
PostureAsset registry baseline telemetry
verdicts
Act IV · Intervene
enforcement edge · containment & protection

Block. Coach. Rewrite. Quarantine. Alert. Before the action commits.

Inline enforcement at the egress point. Tools never run, data never leaves, secrets never surface — unless the verdict says they should. Or rewritten in place when the content can be safely sanitized.

enforcement modes
i.
Identity Gateway Just-in-time scoped grants
ii.
Data Flow Control Destination-aware enforcement
iii.
Inline Action Block · Coach · Rewrite · Quarantine · Alert
— Bring your own cloud —

In your VPC. With your model. Verdicts you sign.

AI Defendo deploys inside your environment. The control plane, the inference, the verdict-signing keys, and the audit trail all run where your data lives.

Engine & Control Plane

The Agent Awareness Engine and its control plane deploy inside your VPC. Verdicts happen where the agent runs.

Deploys via Terraform or Helm
Inference

Choose sovereign mode — local inference in your tenant — or managed mode via a routed LLM provider. The customer-data path stays the same: your traffic, your verdicts, your storage.

One config setting. Switchable per workload.
Verdicts & Audit

Signing keys are generated and held by your deployment. Signed verdicts write to your storage. The audit trail lives where you can see it.

Ed25519 · exportable to your SIEM

Other AI security platforms ingest your agent traffic into vendor clouds and call out to managed LLMs for analysis. Your data lives in three places before a verdict comes back. AI Defendo's sovereign deployment is one place — yours.

The Five Capabilities

Across the full AI surface.

You can't secure what you can't see. You can't trust what you can't verify turn by turn. Five capabilities, one engine — evaluating Behavioral Correctness on every turn.

i.

Shadow AI Discovery

Audit

Find every AI agent, app, and MCP server across your environment — including the ones nobody told you about.

  • Shadow AI apps
  • Unsanctioned MCP
  • Personal AI on corp data
  • Self-hosted LLMs
  • Over-permissioned identities
ii.

AI Workload Security

Protect

Protect deployed AI workloads — inference servers, RAG pipelines, agent runtimes — from runtime exploitation.

  • RCE & container escape
  • Bulk data exfiltration
  • Model weight leakage
  • C2 callback infrastructure
  • Cryptomining injection
iii.

AI Risk Posture

Map

One risk map across every agent, identity, and configuration gap — with prioritized paths to close them.

  • Over-permission paths
  • System configuration drift
  • Confused-deputy vectors
  • Exposed backend endpoints
  • Identity sprawl maps
iv.

Agentic Runtime Security

Enforce

The behavioral correctness wedge. The AI Interceptor inspects every agent turn against the six-dimension verdict — stopping trajectory drift, indirect injection, and unauthorized actions before they execute. Choose your posture per environment.

Alert Coach Quarantine Block
  • Per-turn six-dimension verdict
  • AI Interceptor — inline pre-execution checkpoint
  • Multi-turn trajectory enforcement
  • Indirect injection containment
  • Cross-session memory integrity
v.

Agentic Identity Gateway

Authenticate

Zero-trust identity for every agent action. Cryptographic principal chains, just-in-time scoped grants, and per-turn re-authorization on every tool call — so the agent never inherits more privilege than the current turn requires.

  • Zero-trust agent identity
  • Just-in-time scoped grants
  • Cryptographic principal chains
  • Per-turn re-authorization
  • Confused-deputy prevention
  • Agent-to-agent delegation guards
Early Access Beta

Secure your agent infrastructure.

Join the Beta to begin mapping and securing multi-turn workflows inside your production environment.