Auditable Agents

Published 7 Apr 2026 in cs.AI | (2604.05485v1)

Abstract: LLM agents call tools, query databases, delegate tasks, and trigger external side effects. Once an agent system can act in the world, the question is no longer only whether harmful actions can be prevented--it is whether those actions remain answerable after deployment. We distinguish accountability (the ability to determine compliance and assign responsibility), auditability (the system property that makes accountability possible), and auditing (the process of reconstructing behavior from trustworthy evidence). Our claim is direct: no agent system can be accountable without auditability. To make this operational, we define five dimensions of agent auditability, i.e., action recoverability, lifecycle coverage, policy checkability, responsibility attribution, and evidence integrity, and identify three mechanism classes (detect, enforce, recover) whose temporal information-and-intervention constraints explain why, in practice, no single approach suffices. We support the position with layered evidence rather than a single benchmark: lower-bound ecosystem measurements suggest that even basic security prerequisites for auditability are widely unmet (617 security findings across six prominent open-source projects); runtime feasibility results show that pre-execution mediation with tamper-evident records adds only 8.3 ms median overhead; and controlled recovery experiments show that responsibility-relevant information can be partially recovered even when conventional logs are missing. We propose an Auditability Card for agent systems and identify six open research problems organized by mechanism class.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a five-dimensional auditability framework that establishes necessary conditions for complete system traceability and accountability.
It combines detect, enforce, and recover mechanisms to capture comprehensive action logs and enable reliable policy compliance verification.
Empirical evaluations demonstrate practical metrics and highlight the need for auditability-by-design to overcome existing security and logging limitations.

Auditable Agents: A Five-Dimensional Systems Framework

Motivation and Problem Statement

The paper "Auditable Agents" (2604.05485) asserts that for LLM-based autonomous agent systems acting in the physical or digital world (e.g., via file manipulation, messaging, triggering side effects), the safety evaluation paradigm must shift from only pre-deployment alignment and monitoring to a holistic, system-level property—auditability. The authors distinguish accountability (the ability to determine compliance and responsibility for actions), auditability (system capabilities that enable accountability—i.e., the existence and integrity of evidence), and auditing (the process of reconstructing behavior from evidence). The core claim is explicit: no agent system can be accountable without auditability.

Five-Dimensional Auditability Framework

The central technical contribution is a five-dimensional framework that specifies necessary conditions for a deployed agent system to be auditable, thus enabling ex post accountability. The authors define the following dimensions as logically and operationally required:

Action Recoverability: Sufficient policy-relevant actions are captured in the record, with necessary fields for reconstruction (measured by Action Coverage Rate and Record Fidelity).
Lifecycle Coverage: The recorded trace spans all execution phases (retries, fallbacks, approvals, delegation) rather than just successful end states.
Policy Checkability: The audit record contains enough information for policy compliance to be mechanically decided. Notably, certain policies can be rendered undecidable (verdict = $\bot$ ) by schema omissions, regardless of how much is logged.
Responsibility Attribution: The system must support recovering a chain (or subgraph) of responsibility linking each action to principals (agent, skill, tool, user, etc.).
Evidence Integrity: The provenance and tamper-evidence of records (none/append-only/hash-chained/signed) are foundational; all other dimensions are moot without trustworthy records.

These dimensions cohere into a formal predicate: an execution is auditable (for a set of policies $\Pi$ and thresholds $\theta$ ) iff all existence and quality metrics (e.g., ACR, RF, LPC, SPDR, AC, IS) meet thresholds, and gap burden does not exceed limits. The authors formally show that omitting any field required by a policy can render compliance verification impossible (Proposition 1).

Mechanism Classes and Lifecycle Constraints

The paper exposes a temporal, informational asymmetry: No single layer (static, runtime, post-hoc) can guarantee all dimensions. The authors define three mechanism classes:

Detect: Pre-deployment static analysis of code, configuration, supply chain; can only flag likely auditability gaps, not observed runtime behavior.
Enforce: Inline runtime mediation (intercept/approve/block actions, emit signed records); can create trustworthy evidence for live events but not fill in missing evidence post hoc.
Recover: Post-hoc aggregation and reconstruction of events and responsibility from surviving records or output, often in settings where logs are incomplete, distributed, or redacted.

Empirically, robust auditability requires all three. Auditable systems must instrument code and configuration appropriately (detect), have enforceable, policy-linked runtime mediation (enforce), and support recovery and attribution even from degraded records (recover).

Evidence for the Auditability Gap

The authors substantiate the systemic lack of auditability with layered evidence corresponding to the mechanism classes:

Ecosystem Lower Bound (Detect): Static analysis of six prominent agent frameworks found 617 security findings (64% Tool Misuse, 11% Inter-Agent Comms, etc.), indicating widespread absence of auditability prerequisites. Many systems lack reliable action logging, approval context, provenance tracking, or protected logs.
Runtime Feasibility (Enforce): The Aegis firewall intercepts all tool calls, enabling policy mediation, human approval, and tamper-evident signed records with 8.3% median overhead and high fidelity; 48/48 curated attacks were blocked with only 1.2% false positives on benign calls, showing practical implementability.
Recovery Frontier (Recover): With explicit logs absent, implicit execution tracing (IET) with token-level watermarking allows post-hoc attribution and topology reconstruction in multi-agent dialogues, achieving ~0.93 IoU and 0.95 token-level accuracy, although policy checkability and integrity cannot be retrofitted.

It is explicitly demonstrated that no single evidence block or mechanism class can alone secure all audit dimensions; only their combination suffices.

Positioning to Prior Work

Through a structured comparison table, the authors show that state-of-the-art safety benchmarks, runtime enforcement frameworks, observability tools, audit documentation paradigms, and even recent agent accountability systems fail to comprehensively address at least one of Lifecycle Coverage or Evidence Integrity. Most systems do not support policy-decidable, tamper-evident, life-cycle-complete, fully-attributable audit records. The established frameworks for dataset/model documentation (e.g., model cards, datasheets) inform, but do not solve, the agent audit problem.

Auditability Card and Open Challenges

To catalyze adoption, the paper introduces the Auditability Card: a compact, system-level reporting artifact requiring agent developers to disclose, for each dimension, what is (and is not) covered (actions, phases, policy checkability, attribution, integrity, and handling of missing logs). The card—analogous to model cards—is positioned as a required element in system papers, benchmarks, frameworks, and skill ecosystems.

The paper concludes by identifying six open research problems, mapped to detect, enforce, and recover: predictive static analysis of auditability gaps, provenance minimization for dynamic skills, runtime attribution chain capture, semantic (non-structural) policy verifiability, adversarial recovery under information loss, and federated audit aggregation across fragmented trust boundaries.

Limitations

The authors note several limitations: evaluation relies on their own tooling, does not include a full end-to-end, full-lifecycle stack; only open-source systems are assessed; thresholds for auditability metrics are left as deployment-specific; only structural policies (not high-level semantic or contextual ones) are addressable; the five-dimensionality argument for sufficiency/reducibility is not formally proved and may be invalidated by future agent classes; and privacy-vs-audit-fidelity remains an open tradeoff.

Implications and Prospects

This framework has immediate practical and theoretical consequences:

Practical Deployment: To achieve regulatory and societal accountability, agent architectures must emit structured, integrity-protected, phase-complete, policy-checkable traces by design, not retrofitted post-hoc. Logging and mediation must be first-class citizens in agent development, not just prompts and safety tuning.
Evaluation: Agent benchmarks and competitions must augment (or replace) task-performance and safety metrics with auditability scores and open audit cards.
Security and Supply Chain: Validating agent skill provenance, trust boundaries, and permissions must converge with auditability. Supply-chain-oriented signing (sigstore, hash-chained evidence) is required.
Theory: Execution trace formalization, monotonic auditability predicates, and policy decidability connect to more foundational system accountability literature.
Future Directions: As agent ecosystems diversify (cross-org deployments, embodied agents), open problems around semantic policy decidability, federated auditing, and privacy/Evidence Integrity tradeoffs will become more acute.

Conclusion

The paper delivers a rigorous, multi-layered systems position: agent auditability is a prerequisite for meaningful accountability; auditability is a multifactor property that cannot be guaranteed by pre-deployment testing, runtime enforcement, or post-hoc recovery alone. A full framework must support action recovery, lifecycle coverage, policy decidability, responsibility attribution, and cryptographic evidence integrity. Under these constraints, agent architectures and evaluation regimes must evolve toward auditability-by-design. If widely adopted, this paradigm would force agentic AI to be not just safe and performant, but answerable—closing the gap between theoretical safety and operational responsibility.