VeriGuard: Dual-Stage Safety Framework
- VeriGuard is a dual-stage framework that guarantees safety for LLM agents through formal specification, code synthesis, empirical testing, and runtime enforcement.
- It integrates offline policy synthesis with online monitoring, ensuring safety, privacy, and adherence to user-defined constraints in domains like healthcare and finance.
- Through iterative refinement combining empirical counterexample testing and symbolic verification, VeriGuard achieves provable correctness with zero attack success rates.
VeriGuard is a dual-stage framework that provides provable safety guarantees for LLM-based autonomous agents through rigorous formal specification, code synthesis, empirical and symbolic verification, and continuous runtime enforcement. Designed for high-assurance operation in safety- and privacy-critical settings, VeriGuard integrates both natural-language intent formalization and program verification—yielding policies that are correct by construction and enforced at runtime. The framework bridges informal heuristic guardrails and formal methods, facilitating trustworthy agent deployment in domains such as healthcare, finance, and compliance (Miculicich et al., 3 Oct 2025).
1. Formal Safety Specification and Modeling
VeriGuard begins by formalizing user intent into precise safety requirements. Let denote a natural-language safety request (e.g., “The agent must never send emails to non-company addresses”), and represent the agent’s specification, encompassing input/output types, available tools, and environmental assumptions.
A behavioral policy function is synthesized: where is the runtime schema of argument tuples (such as {recipients: list[str], ...}). Logical safety constraints are derived from and : Each embodies a Boolean safety predicate. The required correctness property is: This ensures every allowed action satisfies all safety constraints.
Agent execution is modeled as a transition system: where is the global state space, is the set of primitive actions, and is a transition relation. Constraints can be phrased as invariants over , e.g., in LTL:
2. Offline Policy Synthesis and Verification
The offline stage assembles, tests, and formally verifies the agent’s behavioral policy. The process proceeds as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Procedure SYNTHESIZE_POLICY(r,𝒮):
(p₀, 𝒫₀) ← G₀(r,𝒮)
C₀ ← H₀(r,𝒮)
t ← 0
loop:
Q ← Vₐ(p_t, C_t)
if user_feedback_available:
(A, R) ← V_d(Q, user_feedback)
else:
(A, R) ← V_d(Q, Ω(Q))
Tests ← TEST_GEN(p_t, 𝒮, r)
e ← run_tests(p_t, Tests)
if e ≠ ∅:
(p_{t+1}, C_{t+1}) ← G_{t+1}(r,𝒮,R,A,e, p_t)
t ← t + 1
continue
embed_pre_post_conditions(p_t, C_t)
result, counterexample ← VERIFY_NAGINI(p_t)
if result == VERIFIED:
return (p_t, C_t)
else:
(p_{t+1}, C_{t+1}) ← G_{t+1}(r,𝒮,R,A,counterexample, p_t)
t ← t + 1 |
Key components include:
- : LLM-driven code and schema generator
- : LLM-driven constraint generator
- : validator analysis and disambiguation
- TEST_GEN: automated PyTest test-case generation
- VERIFY_NAGINI: symbolic verifier via Nagini/Viper
Verification relies on Hoare logic contracts: Nagini generates proof obligations: Counterexamples from either empirical testing or symbolic proof inform policy refinement. Iteration continues until both the test suite and symbolic verifier succeed.
3. Iterative Refinement and Policy Correctness
The framework’s development cycle merges empirical and formal methods:
- Testing → Counterexample: Failures encountered during pytest-driven empirical tests produce concrete argument tuples violating expected behavior.
- Formal Proof → Counterexample: Symbolic verification (Nagini/Viper) may yield logical witnesses showing policy non-conformance to postconditions.
- Refinement Loop: These counterexamples are used to update the LLM’s synthesized code and constraints, converging toward a correct-by-construction policy.
Verified success establishes the theorem: Combining formal guarantees and empirical coverage, VeriGuard ensures policy enforcement strictly matches user intent.
4. Online Runtime Monitoring and Enforcement
During agent deployment, VeriGuard’s runtime monitor validates all agent-proposed actions:
- Argument Extraction: Raw agent data is parsed (via LLM extractor) to argument tuple .
- Policy Invocation: The verified policy function determines allow or deny.
- Enforcement: If allowed, the action executes; if denied, one of Task Termination (TT), Action Blocking (AB), Tool Execution Halt (TEH), or Collaborative Re-planning (CRP) is triggered.
All checks are constant-time Boolean predicate evaluations , ensuring lightweight operation. Formally, decision logic follows:
5. Technical Architecture and Toolchain
The following summarizes the components and data flow:
| Component | Role | Technology |
|---|---|---|
| LLM Agent | Proposes actions | GPT-4o, Gemini |
| Argument Extractor | Parses agent data to structured arguments | LLM-based |
| Policy Function | Encodes allow/deny logic (Python, contracts) | Nagini/Viper |
| Runtime Monitor | Intercepts/enforces agent actions | Python, Boolean logic |
| Execution Backend | Performs permitted tool/API actions | Platform-dependent |
Verification tooling includes Nagini (static Python verifier using Viper), PyTest (test generation/execution), and LLM APIs for code and constraint synthesis.
6. Benchmark Evaluation and Formal Guarantees
VeriGuard is evaluated against security and access-control benchmarks:
| Defense | ASR ↓ | TSR ↑ |
|---|---|---|
| No defense | 51.9% | 42.1% |
| GuardRail | 0.0% | 40.2% |
| VeriGuard (Flash) | 0.0% | 63.3% |
On ASB (Agent Security Bench), VeriGuard consistently achieves zero attack success rate (ASR) and maximized task success rate (TSR). Access-control tasks (EICU-AC, Mind2Web-SC) exhibit perfect accuracy, precision, and recall under VeriGuard policies (e.g., 100%/100%/100% for EICU-AC).
| Method | EICU-AC Acc | EICU-AC P | EICU-AC R | Mind2Web-SC Acc | Mind2Web-SC P | Mind2Web-SC R |
|---|---|---|---|---|---|---|
| GuardAgent (GPT-4) | 98.7% | 100% | 97.5% | 90.0% | 100% | 80.0% |
| AGrail (GPT-4o) | 97.8% | 97.5% | 98.1% | 98.4% | 99.0% | 98.0% |
| VeriGuard (GPT-4o) | 100% | 100% | 100% | 95.1% | 91.3% | 99.0% |
VeriGuard’s formal methodology guarantees that safety constraints are upheld by construction, in contrast to purely heuristic guardrails. The repeated synthesis-verification cycle drives both attack success rate to zero and preserves or improves utility metrics.
7. Framework Significance and Implications
VeriGuard’s contribution lies in its fusion of formal specification, policy/code synthesis via LLMs, empirical validation, static verification, and constant-time runtime monitoring. This end-to-end approach establishes a trustworthy pipeline for constructing and deploying LLM agents with rigorously enforced safety invariants.
This suggests an emerging paradigm where agent safety is not just empirically observed but formally proven. A plausible implication is the generalization of VeriGuard’s methodology to broader classes of model-driven agents and safety-critical domains, advancing the field toward scalable formal assurance in machine autonomy (Miculicich et al., 3 Oct 2025).