Papers
Topics
Authors
Recent
Search
2000 character limit reached

VeriGuard: Dual-Stage Safety Framework

Updated 5 January 2026
  • VeriGuard is a dual-stage framework that guarantees safety for LLM agents through formal specification, code synthesis, empirical testing, and runtime enforcement.
  • It integrates offline policy synthesis with online monitoring, ensuring safety, privacy, and adherence to user-defined constraints in domains like healthcare and finance.
  • Through iterative refinement combining empirical counterexample testing and symbolic verification, VeriGuard achieves provable correctness with zero attack success rates.

VeriGuard is a dual-stage framework that provides provable safety guarantees for LLM-based autonomous agents through rigorous formal specification, code synthesis, empirical and symbolic verification, and continuous runtime enforcement. Designed for high-assurance operation in safety- and privacy-critical settings, VeriGuard integrates both natural-language intent formalization and program verification—yielding policies that are correct by construction and enforced at runtime. The framework bridges informal heuristic guardrails and formal methods, facilitating trustworthy agent deployment in domains such as healthcare, finance, and compliance (Miculicich et al., 3 Oct 2025).

1. Formal Safety Specification and Modeling

VeriGuard begins by formalizing user intent into precise safety requirements. Let rr denote a natural-language safety request (e.g., “The agent must never send emails to non-company addresses”), and S\mathcal{S} represent the agent’s specification, encompassing input/output types, available tools, and environmental assumptions.

A behavioral policy function is synthesized: p:P{allow,deny}p : \mathcal{P} \to \{ \texttt{allow}, \texttt{deny} \} where P\mathcal{P} is the runtime schema of argument tuples (such as {recipients: list[str], ...}). Logical safety constraints are derived from rr and S\mathcal{S}: C={c1,...,cn}C = \{ c_1, ..., c_n \} Each ci:P{true,false}c_i: \mathcal{P} \to \{\mathit{true}, \mathit{false}\} embodies a Boolean safety predicate. The required correctness property is: pCπP  .(p(π)=allowcCc(π))p \models C \quad \equiv \quad \forall\,\pi \in \mathcal{P}\;. \left( p(\pi) = \texttt{allow} \Longrightarrow \bigwedge_{c \in C} c(\pi) \right) This ensures every allowed action satisfies all safety constraints.

Agent execution is modeled as a transition system: T=(X,A,T)\mathcal{T} = (X, A, T) where XX is the global state space, AA is the set of primitive actions, and T:X×AXT: X \times A \to X is a transition relation. Constraints can be phrased as invariants over T\mathcal{T}, e.g., in LTL: G  ¬unsafe,    where    unsafe(x,a)=¬ci(args(x,a))\mathbf{G}\;\neg\mathit{unsafe}, \;\;\text{where}\;\; \mathit{unsafe}(x,a) = \lnot c_i(\mathit{args}(x,a))

2. Offline Policy Synthesis and Verification

The offline stage assembles, tests, and formally verifies the agent’s behavioral policy. The process proceeds as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Procedure SYNTHESIZE_POLICY(r,𝒮):
    (p, 𝒫)  G(r,𝒮)
    C         H(r,𝒮)
    t  0
    loop:
        Q  Vₐ(p_t, C_t)
        if user_feedback_available:
            (A, R)  V_d(Q, user_feedback)
        else:
            (A, R)  V_d(Q, Ω(Q))
        Tests  TEST_GEN(p_t, 𝒮, r)
        e  run_tests(p_t, Tests)
        if e  :
            (p_{t+1}, C_{t+1})  G_{t+1}(r,𝒮,R,A,e, p_t)
            t  t + 1
            continue
        embed_pre_post_conditions(p_t, C_t)
        result, counterexample  VERIFY_NAGINI(p_t)
        if result == VERIFIED:
            return (p_t, C_t)
        else:
            (p_{t+1}, C_{t+1})  G_{t+1}(r,𝒮,R,A,counterexample, p_t)
            t  t + 1

Key components include:

  • GtG_t: LLM-driven code and schema generator
  • HtH_t: LLM-driven constraint generator
  • Va,VdV_a, V_d: validator analysis and disambiguation
  • TEST_GEN: automated PyTest test-case generation
  • VERIFY_NAGINI: symbolic verifier via Nagini/Viper

Verification relies on Hoare logic contracts: {Cpre}  p  {Cpost}\{\,C_{\mathrm{pre}}\,\}\;p\;\{\,C_{\mathrm{post}}\,\} Nagini generates proof obligations:   π.  Cpre(π)exec(p,π)Cpost\forall\;\pi. \;C_{\mathrm{pre}}(\pi) \Longrightarrow \mathrm{exec}(p, \pi) \models C_{\mathrm{post}} Counterexamples from either empirical testing or symbolic proof inform policy refinement. Iteration continues until both the test suite and symbolic verifier succeed.

3. Iterative Refinement and Policy Correctness

The framework’s development cycle merges empirical and formal methods:

  • Testing → Counterexample: Failures encountered during pytest-driven empirical tests produce concrete argument tuples π\pi^* violating expected behavior.
  • Formal Proof → Counterexample: Symbolic verification (Nagini/Viper) may yield logical witnesses showing policy non-conformance to postconditions.
  • Refinement Loop: These counterexamples are used to update the LLM’s synthesized code and constraints, converging toward a correct-by-construction policy.

Verified success establishes the theorem: πP.(Cpre(π)    exec(p,π)Cpost)\forall\,\pi \in \mathcal{P}. \left( C_{\mathrm{pre}}(\pi) \implies \mathrm{exec}(p, \pi) \models C_{\mathrm{post}} \right) Combining formal guarantees and empirical coverage, VeriGuard ensures policy enforcement strictly matches user intent.

4. Online Runtime Monitoring and Enforcement

During agent deployment, VeriGuard’s runtime monitor validates all agent-proposed actions:

  • Argument Extraction: Raw agent data is parsed (via LLM extractor) to argument tuple π\pi.
  • Policy Invocation: The verified policy function p(π)p(\pi) determines allow or deny.
  • Enforcement: If allowed, the action executes; if denied, one of Task Termination (TT), Action Blocking (AB), Tool Execution Halt (TEH), or Collaborative Re-planning (CRP) is triggered.

All checks are constant-time Boolean predicate evaluations ici(π)\bigwedge_i c_i(\pi), ensuring lightweight operation. Formally, decision logic follows: decide(a)={allow,cCc(f(data(a))) deny,otherwise\mathrm{decide}(a) = \begin{cases} \texttt{allow}, & \bigwedge_{c \in C} c(f(\mathit{data}(a))) \ \texttt{deny}, & \text{otherwise} \end{cases}

5. Technical Architecture and Toolchain

The following summarizes the components and data flow:

Component Role Technology
LLM Agent Proposes actions GPT-4o, Gemini
Argument Extractor Parses agent data to structured arguments LLM-based
Policy Function Encodes allow/deny logic (Python, contracts) Nagini/Viper
Runtime Monitor Intercepts/enforces agent actions Python, Boolean logic
Execution Backend Performs permitted tool/API actions Platform-dependent

Verification tooling includes Nagini (static Python verifier using Viper), PyTest (test generation/execution), and LLM APIs for code and constraint synthesis.

6. Benchmark Evaluation and Formal Guarantees

VeriGuard is evaluated against security and access-control benchmarks:

Defense ASR TSR ↑
No defense 51.9% 42.1%
GuardRail 0.0% 40.2%
VeriGuard (Flash) 0.0% 63.3%

On ASB (Agent Security Bench), VeriGuard consistently achieves zero attack success rate (ASR) and maximized task success rate (TSR). Access-control tasks (EICU-AC, Mind2Web-SC) exhibit perfect accuracy, precision, and recall under VeriGuard policies (e.g., 100%/100%/100% for EICU-AC).

Method EICU-AC Acc EICU-AC P EICU-AC R Mind2Web-SC Acc Mind2Web-SC P Mind2Web-SC R
GuardAgent (GPT-4) 98.7% 100% 97.5% 90.0% 100% 80.0%
AGrail (GPT-4o) 97.8% 97.5% 98.1% 98.4% 99.0% 98.0%
VeriGuard (GPT-4o) 100% 100% 100% 95.1% 91.3% 99.0%

VeriGuard’s formal methodology guarantees that safety constraints are upheld by construction, in contrast to purely heuristic guardrails. The repeated synthesis-verification cycle drives both attack success rate to zero and preserves or improves utility metrics.

7. Framework Significance and Implications

VeriGuard’s contribution lies in its fusion of formal specification, policy/code synthesis via LLMs, empirical validation, static verification, and constant-time runtime monitoring. This end-to-end approach establishes a trustworthy pipeline for constructing and deploying LLM agents with rigorously enforced safety invariants.

This suggests an emerging paradigm where agent safety is not just empirically observed but formally proven. A plausible implication is the generalization of VeriGuard’s methodology to broader classes of model-driven agents and safety-critical domains, advancing the field toward scalable formal assurance in machine autonomy (Miculicich et al., 3 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to VeriGuard Framework.