VeriAct: AI Auditability & Regulatory Compliance
- VeriAct is a unified framework for verifiability, auditability, and compliance in AI systems, incorporating cryptographic records and formal specification synthesis.
- It employs an interrelated architecture with components like Action Attestation Layers, Audit Agents, and Challenge-Response protocols to secure tamper-evident logs and deliver real-time risk assessments.
- VeriAct supports regulatory adherence through mapping rigorous verification activities to standards such as the EU AI Act while ensuring scalability and practical implementation.
VeriAct refers to a series of interrelated frameworks and methodologies for verifiability, auditability, and compliance in AI agent execution, formal specification synthesis, and high-risk system assurance. It enables provable monitoring, alignment, and conformity verification for autonomous LLM agents, formal specification generation, and regulatory compliance, particularly under regimes such as the EU AI Act. VeriAct is characterized by agentic, cryptographically rigorous, and algorithmically structured mechanisms that enable end-to-end verifiability and make LLM-based and agentic systems reliably auditable, correct, and compliant (Gupta, 19 Dec 2025, Misu et al., 31 Mar 2026, Buscemi et al., 15 Dec 2025, Zhang, 23 Feb 2026).
1. Verifiability-First Architecture for Autonomous LLM Agents
VeriAct establishes explicit observability and auditability boundaries around LLM agents through several tightly integrated components (Gupta, 19 Dec 2025):
- Intent Specification (ISpec): A machine-readable “constitution” that precisely specifies allowed goals, forbidden operations, and detailed policy rules.
- Action Attestation Layer (AAL): Cryptographic wrappers that record every tool invocation or external effect within a tamper-evident Provenance Log. Each attested action is encapsulated as a receipt , with cryptographic hashes and digital signatures ensuring integrity, non-repudiation, and authenticity.
- Audit Agents (AAs): Lightweight rule/statistical/semantic ensembles that asynchronously monitor provenance logs, computing real-time aggregate alignment scores against ISpec. This ensemble combines rule-based verifiers, statistical classifiers (e.g., shallow transformers trained on aligned/misaligned sequences), and fast entailment checks using NLI models.
- Challenge-Response Attestation (CRA): When uncertainties or violations arise, a two-party protocol is invoked—Verifier Stack (VS) issues a cryptographically signed challenge, and the agent responds with an explanation and commitment opening, optionally with zero-knowledge proofs (ZKPs).
- Controller & Remediator: Automatically triggers safe-mode or tool blocks if agent behavior is flagged as risky or non-compliant.
This architecture guarantees that all consequential actions are cryptographically attested, undergo both symbolic/statistical validation, and trigger interactive audits when specified risk thresholds are crossed.
2. Formal Specification Synthesis and Agentic Loop
VeriAct also denotes an agentic, LLM-guided framework for synthesizing and repairing Java Modeling Language (JML) specifications that are both verifiable and meaningfully correct/complete (Misu et al., 31 Mar 2026). The core mechanism is a closed agentic loop consisting of:
- LLM-Driven Planning: The LLM reads the Java method, proposes initial pre- and postconditions in JML.
- Automated Verification: Invokes OpenJML to automatically verify the candidate specification against the target method.
- Spec-Harness Feedback: If verification passes, the Spec-Harness computes PreCorr, PreComp, PostCorr, and PostComp metrics using symbolic Hoare-triple checks against both valid and mutated test pairs. These metrics measure that postconditions are both satisfied on correct executions (), and reject mutated outputs (), and analogously for preconditions.
- Specification Repair: The agent uses structured feedback (either from OpenJML error logs or from Spec-Harness metrics) to iteratively refine the JML specification.
- Termination: The loop continues until both PostCorr and PostComp exceed a predefined threshold or a maximum number of iterations is reached.
This method reveals that high verifier pass rates (VR) are insufficient: many specifications accepted by OpenJML are under- or over-constrained. Spec-Harness metrics and VeriAct’s iterative repair push beyond the "prompt-engineering" ceiling, achieving significantly higher Meaningfully Verified Rates (MVR) than previous classical or prompt-optimized approaches. For instance, VeriAct achieves MVR ≈ 23–24% on FormalBench, outperforming prompt-only methods (MVR ≈ 11%) (Misu et al., 31 Mar 2026).
3. Regulatory Compliance: EU AI Act and Conformity Assessment
VeriAct further refers to a compliance verification framework developed for assessing high-risk AI systems under the EU AI Act (Buscemi et al., 15 Dec 2025). The approach decomposes legal requirements along two fundamental axes:
- Type of method (): Controls () versus Testing ().
- Target of assessment (): Data (), Model (), Processes (0), Final Product (1).
A mapping function 2 links each legal requirement—such as Robustness, Accuracy, or Transparency—to concrete verification activities (e.g., stress testing, calibration, model cards). The compliance workflow encompasses:
- Risk classification (e.g., Article 6 of the AI Act).
- Requirements mapping via 3 to generate an evidence package.
- Implementation of controls and running of tests appropriate for 4 targets.
- Documentation according to legal annexes and internal audits.
- Drafting a conformity report.
An illustrative example is the verification of a vehicular Intrusion Detection System (IDS), in which robustness, accuracy, and transparency obligations are decomposed across controls and tests, data, models, processes, and final products, providing comprehensive, traceable documentation and assessment coverage (Buscemi et al., 15 Dec 2025).
4. Verifiability Kernel and the “Right to History”
VeriAct, as conceptualized in personal-agent execution on local hardware, generalizes the "Right to History" principle: every individual is entitled to a complete, tamper-evident, independently verifiable record of every AI agent action (Zhang, 23 Feb 2026). This framework enforces five formal invariants:
- Append-Only: Log is strictly append-only; no modification or deletion is possible due to Merkle tree commitment.
- Completeness: Any permitted action passing validation and resource checks appears in the log.
- Integrity: The Merkle root at any time commits to the entire event log.
- Boundary Enforcement: Actions outside declared capabilities are rejected and never logged.
- Energy Conservation: Actors cannot exceed their allocated computation or resource budgets.
The reference architecture consists of a trusted kernel containing capability isolation, energy budget governance, an append-only, Merkle-committed log (RFC 6962), and a human-approval "hold" mechanism for sensitive actions. Tamper-evidence and detection guarantees derive from Merkle proofs, external anchoring (e.g., blockchain), and strict pipeline validation. The implementation achieves sub-1.3 ms median latency, ~400 actions/sec throughput, and scalable 448-byte Merkle proofs for 10,000 entries on commodity hardware (Zhang, 23 Feb 2026).
5. Quantitative Evaluation and Benchmarking
VeriAct mechanisms have been evaluated using several concrete benchmarks:
- OPERA Benchmark Suite (Gupta, 19 Dec 2025): Measures observability, attribution confidence, red-team robustness, time-to-detection, remediation latency, false positive/negative rates, and a composite verifiability score (5). Results show significant reductions in time-to-detect (from 21.8 s to 11.9 s with VeriAct), high attribution confidence (AC = 0.85), and low runtime overhead (~6.5% per action).
- Spec-Harness Metrics (Misu et al., 31 Mar 2026): Discriminates between verified and meaningfully correct/complete specs. Agentic, verification-guided specification (VeriAct) achieves higher MVRs and closes completeness gaps missed by classical/prompt methods.
- Compliance Coverage (Buscemi et al., 15 Dec 2025): The (6, 7) matrix ensures no regulatory obligation is omitted; explicit mapping to legal Articles facilitates clarity and traceability in assessment practices.
6. Practical Instantiations and Scaling Considerations
VeriAct is designed for deployment in highly heterogeneous, multi-modal, and federated agent environments:
- Tool Integration: Every API/tool (vision, code, search) must be mediated by the Action Attestation Layer, exposing only attested interfaces; all inputs/outputs are cryptographically hashed.
- Sidecar Execution of Verifiers/Auditors: Audit and Verifier Stack components are implemented as sidecar services, communicating through low-latency RPC.
- Scalability: Each agent maintains a local provenance log; cross-agent attestation relies on signed receipt hashes or Pedersen commitments for privacy. Federated verifiability allows organizations to exchange proofs-of-action without disclosing sensitive payloads.
- Human-in-the-Loop: All architectures support live scoring, fine-grained traceability, and override mechanisms for intent specification and anomaly triage.
Total cryptographic overhead remains moderate—<10 ms per typical action, ~30–50 ms for high-risk operations requiring zero-knowledge proof generation. Practical mechanisms for Merkle-based logging, capability isolation, and energy governance have been demonstrated to be compatible with both resource-constrained and multi-agent settings (Gupta, 19 Dec 2025, Zhang, 23 Feb 2026).
VeriAct, across its instantiations, systematically unifies cryptographic attestations, statistical and symbolic auditing, agentic specification repair, and formal regulatory mapping to achieve strong, quantitative, and operational guarantees of verifiability, alignment, and compliance in contemporary autonomous AI systems.