Papers
Topics
Authors
Recent
Search
2000 character limit reached

AgentAssert: Contract Enforcement for AI Agents

Updated 2 July 2026
  • AgentAssert is the reference implementation of the Agent Behavioral Contracts (ABC) framework, specifying preconditions, invariants, governance constraints, and recovery mechanisms.
  • It operationalizes low-latency, probabilistic compliance and behavioral drift modeling to ensure robust safety and liveness properties in dynamic AI agent environments.
  • The modular Python library supports multi-agent contract composition, real-time monitoring, and empirical benchmarking with sub-10 ms enforcement overhead for efficient agent governance.

AgentAssert is the reference implementation of the Agent Behavioral Contracts (ABC) framework, providing formal, runtime-enforceable behavioral guarantees for autonomous AI agents. It instantiates Design-by-Contract principles for agentic AI, enabling the specification of preconditions, invariants, governance policies, and structured recoveries as first-class artifacts. AgentAssert operationalizes ABC via a modular Python library with low-latency runtime enforcement, supporting rigorous probabilistic satisfaction guarantees, behavioral drift bounding, multi-agent contract composition, and empirical benchmarking.

1. Formal Structure: Agent Behavioral Contracts

At the core of AgentAssert is the ABC contract structure, defined as a 4-tuple C=(P,I,G,R)C = (P, I, G, R). Each component is a first-class, runtime-enforceable artifact:

  • Preconditions (PP): Finite set of predicates p1,...,pmp_1, ..., p_m over the initial agent state s0s_0 required to hold before execution (e.g., "user identity verified").
  • Invariants (II): Partitioned into:
    • Hard invariants (IhardI_{\mathrm{hard}}): Safety-critical properties that must be maintained at every step (e.g., "no PII is emitted," "no unauthorized trades").
    • Soft invariants (IsoftI_{\mathrm{soft}}): Desirable but recoverable properties (e.g., "response tone remains professional"), requiring restoration within kk steps.
  • Governance Constraints (GG): Predicates over agent actions ata_t, split into:
    • Hard governance (PP0): Zero-tolerance constraints (e.g., "no forbidden API calls").
    • Soft governance (PP1): Advisory constraints (e.g., cost warnings).
  • Recovery Mechanism (PP2): Partial function PP3, where, for a violated soft constraint PP4 and state PP5, it yields a bounded sequence of corrective actions to reestablish compliance. Failure to recover triggers a RecoveryFailed event.

In this framework, hard constraints operationalize safety ("nothing bad ever happens"), while soft constraints with bounded recovery provide liveness guarantees ("something good happens within PP6 steps").

2. Probabilistic Satisfaction and Compliance Guarantees

AgentAssert extends contract satisfaction to stochastic, non-deterministic agents via PP7-satisfaction:

Let PP8 and PP9 denote the fraction of hard and soft constraints satisfied at step p1,...,pmp_1, ..., p_m0, for a session of length p1,...,pmp_1, ..., p_m1. An agent p1,...,pmp_1, ..., p_m2 p1,...,pmp_1, ..., p_m3-satisfies contract p1,...,pmp_1, ..., p_m4 (notation: p1,...,pmp_1, ..., p_m5) if, with probability at least p1,...,pmp_1, ..., p_m6:

  • Persistent compliance (Hard guarantee):

p1,...,pmp_1, ..., p_m7

  • Recoverable compliance (Soft guarantee):

p1,...,pmp_1, ..., p_m8

Here, p1,...,pmp_1, ..., p_m9 bounds tolerable simultaneous soft constraint failures, and s0s_00 specifies the recovery horizon. These principles correspond to probabilistic computation tree logic (PCTL) statements for safety and liveness under stochastic policy execution.

3. Behavioral Drift: Modeling and Theoretical Limits

AgentAssert incorporates behavioral drift modeling to quantify agent deviations from contract compliance. Drift s0s_01 is modeled as an Ornstein–Uhlenbeck process:

s0s_02

where:

  • s0s_03: natural drift rate (uncontrolled agent divergence),
  • s0s_04: recovery strength (contract enforcement efficacy),
  • s0s_05: process noise amplitude,
  • s0s_06: standard Wiener process.

The Drift Bounds Theorem establishes:

  • Stationary drift: s0s_07,
  • Mean drift bound: s0s_08 (enforcement cap),
  • Variance: s0s_09,
  • Tail bound: II0,
  • Exponential convergence: II1,
  • Contract design criterion: For drift below II2 with probability II3, select

II4

with the low-noise limit II5.

This formalization enables engineering behavioral contracts with explicit drift and recovery guarantees (Bhardwaj, 25 Feb 2026).

4. Contract Composition in Multi-Agent Chains

AgentAssert supports composition of ABCs for multi-agent deployments, with the following guarantees:

  • Serial composition: Given contracts II6, II7 for agents II8, II9, and handoff invariant IhardI_{\mathrm{hard}}0, serial composition IhardI_{\mathrm{hard}}1 is well-formed if:
    • Interface compatibility: IhardI_{\mathrm{hard}}2,
    • Assumption discharge: IhardI_{\mathrm{hard}}3,
    • Governance consistency: allowed actions of IhardI_{\mathrm{hard}}4 do not conflict with prohibitions in IhardI_{\mathrm{hard}}5,
    • Recovery independence: IhardI_{\mathrm{hard}}6 preserves IhardI_{\mathrm{hard}}7.
  • Probabilistic degradation: If agent IhardI_{\mathrm{hard}}8 is IhardI_{\mathrm{hard}}9-satisfying and IsoftI_{\mathrm{soft}}0 is IsoftI_{\mathrm{soft}}1-satisfying, the composed chain IsoftI_{\mathrm{soft}}2-satisfies its composed contract with:

IsoftI_{\mathrm{soft}}3

where IsoftI_{\mathrm{soft}}4 are success and drift parameters for the handoff.

For IsoftI_{\mathrm{soft}}5-agent chains, IsoftI_{\mathrm{soft}}6 and IsoftI_{\mathrm{soft}}7. This formalizes multi-agent reliability under ABC/AgentAssert (Bhardwaj, 25 Feb 2026).

5. AgentAssert Architecture and API

AgentAssert is a modular Python library with overhead IsoftI_{\mathrm{soft}}810 ms per action, designed for practical integration and extensibility. The key layers include:

  • Parser & Validator: Loads ContractSpec YAML contracts, validates schema consistency.
  • Constraint Evaluator: Checks IsoftI_{\mathrm{soft}}9 predicates against agent state/action; computes kk0, kk1.
  • Metric Tracker: Maintains JSD-based drift kk2, compliance time series, recovery logs, and stress resilience index kk3.
  • Runtime Monitor: Orchestrates enforcement per agent turn—evaluates constraints; updates drift and compliance; emits violation/drift events; handles recovery via kk4 if needed; resets state on re-satisfaction.
  • Recovery Executor: Dispatches corrective actions per a taxonomy (LLM re-prompt, tool call, human escalation).
  • Integration Hooks: Framework-agnostic adapters for LangChain, AutoGen, and custom agent infrastructure.
  • Benchmark Runner: Supports evaluation on synthetic AgentContract-Bench and live agent sessions.

Example API usage (pseudocode):

PP03 Under typical enterprise contracts (kk5 constraints, kk6 action types), the total overhead is 5–10 ms per action, much less than LLM inference latency (200–2000 ms) (Bhardwaj, 25 Feb 2026).

6. Benchmarking and Empirical Results

AgentAssert is evaluated on AgentContract-Bench, comprising 200 scenarios across five domains, 50 governance stress profiles, and 50 composition cases. Each scenario consists of a 5–8 step trace with ground-truth violation annotations.

Metric Value/Range Notes
Detection Accuracy 1.0000 (all scenarios) All annotated violations flagged
Hard Compliance (kk7) kk8–kk9 (domains), GG0 (stress), GG1 (composition) Safety-critical constraint adherence
Soft Compliance (GG2) GG3–GG4 (domains), GG5 (composition) Liveness property adherence
Mean Drift (GG6) GG7–GG8 (domains), GG9 (composition) Behavioral divergence measure
Reliability Index (ata_t0) ata_t1–ata_t2 (domains), ata_t3 (composition) Composite performance score (Def. 3.12)
Overhead ata_t410 ms/action ata_t5 constraints

In 1,980 live E1 sessions across seven LLM models and six vendors, contracted agents surface 5.2–6.8 soft violations/session vs. 0.0–0.3/session (uncontracted), with ata_t6 (Welch's ata_t7-test, Bonferroni correction), ata_t8–ata_t9, and power PP00. Hard constraint compliance under stress is 88–100%; behavioral drift is bounded to PP01; 100% recovery is observed for frontier models (PP02), and 17–100% across all models, all at sub-10 ms enforcement overhead (Bhardwaj, 25 Feb 2026).

7. Significance and Applications

AgentAssert enables practitioners to specify, enforce, and monitor behavioral properties of autonomous agents with millisecond-level runtime checks, using formal YAML contracts and integration with popular LLM agent frameworks. The provision of probabilistic compliance guarantees and empirical drift bounding supports reliable agentic deployments under uncertainty and across composed workflows. As empirical results on AgentContract-Bench and live LLM deployments show, AgentAssert provides effective detection, high compliance, bounded behavioral drift, and robust recovery, addressing key governance and control gaps in agentic AI systems (Bhardwaj, 25 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AgentAssert.