AgentAssert: Contract Enforcement for AI Agents
- AgentAssert is the reference implementation of the Agent Behavioral Contracts (ABC) framework, specifying preconditions, invariants, governance constraints, and recovery mechanisms.
- It operationalizes low-latency, probabilistic compliance and behavioral drift modeling to ensure robust safety and liveness properties in dynamic AI agent environments.
- The modular Python library supports multi-agent contract composition, real-time monitoring, and empirical benchmarking with sub-10 ms enforcement overhead for efficient agent governance.
AgentAssert is the reference implementation of the Agent Behavioral Contracts (ABC) framework, providing formal, runtime-enforceable behavioral guarantees for autonomous AI agents. It instantiates Design-by-Contract principles for agentic AI, enabling the specification of preconditions, invariants, governance policies, and structured recoveries as first-class artifacts. AgentAssert operationalizes ABC via a modular Python library with low-latency runtime enforcement, supporting rigorous probabilistic satisfaction guarantees, behavioral drift bounding, multi-agent contract composition, and empirical benchmarking.
1. Formal Structure: Agent Behavioral Contracts
At the core of AgentAssert is the ABC contract structure, defined as a 4-tuple . Each component is a first-class, runtime-enforceable artifact:
- Preconditions (): Finite set of predicates over the initial agent state required to hold before execution (e.g., "user identity verified").
- Invariants (): Partitioned into:
- Hard invariants (): Safety-critical properties that must be maintained at every step (e.g., "no PII is emitted," "no unauthorized trades").
- Soft invariants (): Desirable but recoverable properties (e.g., "response tone remains professional"), requiring restoration within steps.
- Governance Constraints (): Predicates over agent actions , split into:
- Hard governance (0): Zero-tolerance constraints (e.g., "no forbidden API calls").
- Soft governance (1): Advisory constraints (e.g., cost warnings).
- Recovery Mechanism (2): Partial function 3, where, for a violated soft constraint 4 and state 5, it yields a bounded sequence of corrective actions to reestablish compliance. Failure to recover triggers a RecoveryFailed event.
In this framework, hard constraints operationalize safety ("nothing bad ever happens"), while soft constraints with bounded recovery provide liveness guarantees ("something good happens within 6 steps").
2. Probabilistic Satisfaction and Compliance Guarantees
AgentAssert extends contract satisfaction to stochastic, non-deterministic agents via 7-satisfaction:
Let 8 and 9 denote the fraction of hard and soft constraints satisfied at step 0, for a session of length 1. An agent 2 3-satisfies contract 4 (notation: 5) if, with probability at least 6:
- Persistent compliance (Hard guarantee):
7
- Recoverable compliance (Soft guarantee):
8
Here, 9 bounds tolerable simultaneous soft constraint failures, and 0 specifies the recovery horizon. These principles correspond to probabilistic computation tree logic (PCTL) statements for safety and liveness under stochastic policy execution.
3. Behavioral Drift: Modeling and Theoretical Limits
AgentAssert incorporates behavioral drift modeling to quantify agent deviations from contract compliance. Drift 1 is modeled as an Ornstein–Uhlenbeck process:
2
where:
- 3: natural drift rate (uncontrolled agent divergence),
- 4: recovery strength (contract enforcement efficacy),
- 5: process noise amplitude,
- 6: standard Wiener process.
The Drift Bounds Theorem establishes:
- Stationary drift: 7,
- Mean drift bound: 8 (enforcement cap),
- Variance: 9,
- Tail bound: 0,
- Exponential convergence: 1,
- Contract design criterion: For drift below 2 with probability 3, select
4
with the low-noise limit 5.
This formalization enables engineering behavioral contracts with explicit drift and recovery guarantees (Bhardwaj, 25 Feb 2026).
4. Contract Composition in Multi-Agent Chains
AgentAssert supports composition of ABCs for multi-agent deployments, with the following guarantees:
- Serial composition: Given contracts 6, 7 for agents 8, 9, and handoff invariant 0, serial composition 1 is well-formed if:
- Interface compatibility: 2,
- Assumption discharge: 3,
- Governance consistency: allowed actions of 4 do not conflict with prohibitions in 5,
- Recovery independence: 6 preserves 7.
- Probabilistic degradation: If agent 8 is 9-satisfying and 0 is 1-satisfying, the composed chain 2-satisfies its composed contract with:
3
where 4 are success and drift parameters for the handoff.
For 5-agent chains, 6 and 7. This formalizes multi-agent reliability under ABC/AgentAssert (Bhardwaj, 25 Feb 2026).
5. AgentAssert Architecture and API
AgentAssert is a modular Python library with overhead 810 ms per action, designed for practical integration and extensibility. The key layers include:
- Parser & Validator: Loads ContractSpec YAML contracts, validates schema consistency.
- Constraint Evaluator: Checks 9 predicates against agent state/action; computes 0, 1.
- Metric Tracker: Maintains JSD-based drift 2, compliance time series, recovery logs, and stress resilience index 3.
- Runtime Monitor: Orchestrates enforcement per agent turn—evaluates constraints; updates drift and compliance; emits violation/drift events; handles recovery via 4 if needed; resets state on re-satisfaction.
- Recovery Executor: Dispatches corrective actions per a taxonomy (LLM re-prompt, tool call, human escalation).
- Integration Hooks: Framework-agnostic adapters for LangChain, AutoGen, and custom agent infrastructure.
- Benchmark Runner: Supports evaluation on synthetic AgentContract-Bench and live agent sessions.
Example API usage (pseudocode):
03 Under typical enterprise contracts (5 constraints, 6 action types), the total overhead is 5–10 ms per action, much less than LLM inference latency (200–2000 ms) (Bhardwaj, 25 Feb 2026).
6. Benchmarking and Empirical Results
AgentAssert is evaluated on AgentContract-Bench, comprising 200 scenarios across five domains, 50 governance stress profiles, and 50 composition cases. Each scenario consists of a 5–8 step trace with ground-truth violation annotations.
| Metric | Value/Range | Notes |
|---|---|---|
| Detection Accuracy | 1.0000 (all scenarios) | All annotated violations flagged |
| Hard Compliance (7) | 8–9 (domains), 0 (stress), 1 (composition) | Safety-critical constraint adherence |
| Soft Compliance (2) | 3–4 (domains), 5 (composition) | Liveness property adherence |
| Mean Drift (6) | 7–8 (domains), 9 (composition) | Behavioral divergence measure |
| Reliability Index (0) | 1–2 (domains), 3 (composition) | Composite performance score (Def. 3.12) |
| Overhead | 410 ms/action | 5 constraints |
In 1,980 live E1 sessions across seven LLM models and six vendors, contracted agents surface 5.2–6.8 soft violations/session vs. 0.0–0.3/session (uncontracted), with 6 (Welch's 7-test, Bonferroni correction), 8–9, and power 00. Hard constraint compliance under stress is 88–100%; behavioral drift is bounded to 01; 100% recovery is observed for frontier models (02), and 17–100% across all models, all at sub-10 ms enforcement overhead (Bhardwaj, 25 Feb 2026).
7. Significance and Applications
AgentAssert enables practitioners to specify, enforce, and monitor behavioral properties of autonomous agents with millisecond-level runtime checks, using formal YAML contracts and integration with popular LLM agent frameworks. The provision of probabilistic compliance guarantees and empirical drift bounding supports reliable agentic deployments under uncertainty and across composed workflows. As empirical results on AgentContract-Bench and live LLM deployments show, AgentAssert provides effective detection, high compliance, bounded behavioral drift, and robust recovery, addressing key governance and control gaps in agentic AI systems (Bhardwaj, 25 Feb 2026).