VeriGuard: Dual-Stage Safety Framework

Updated 5 January 2026

VeriGuard is a dual-stage framework that guarantees safety for LLM agents through formal specification, code synthesis, empirical testing, and runtime enforcement.
It integrates offline policy synthesis with online monitoring, ensuring safety, privacy, and adherence to user-defined constraints in domains like healthcare and finance.
Through iterative refinement combining empirical counterexample testing and symbolic verification, VeriGuard achieves provable correctness with zero attack success rates.

VeriGuard is a dual-stage framework that provides provable safety guarantees for LLM-based autonomous agents through rigorous formal specification, code synthesis, empirical and symbolic verification, and continuous runtime enforcement. Designed for high-assurance operation in safety- and privacy-critical settings, VeriGuard integrates both natural-language intent formalization and program verification—yielding policies that are correct by construction and enforced at runtime. The framework bridges informal heuristic guardrails and formal methods, facilitating trustworthy agent deployment in domains such as healthcare, finance, and compliance (Miculicich et al., 3 Oct 2025).

1. Formal Safety Specification and Modeling

VeriGuard begins by formalizing user intent into precise safety requirements. Let $r$ denote a natural-language safety request (e.g., “The agent must never send emails to non-company addresses”), and $\mathcal{S}$ represent the agent’s specification, encompassing input/output types, available tools, and environmental assumptions.

A behavioral policy function is synthesized: $p : \mathcal{P} \to \{ \texttt{allow}, \texttt{deny} \}$ where $\mathcal{P}$ is the runtime schema of argument tuples (such as {recipients: list[str], ...}). Logical safety constraints are derived from $r$ and $\mathcal{S}$ : $C = \{ c_1, ..., c_n \}$ Each $c_i: \mathcal{P} \to \{\mathit{true}, \mathit{false}\}$ embodies a Boolean safety predicate. The required correctness property is: $p \models C \quad \equiv \quad \forall\,\pi \in \mathcal{P}\;. \left( p(\pi) = \texttt{allow} \Longrightarrow \bigwedge_{c \in C} c(\pi) \right)$ This ensures every allowed action satisfies all safety constraints.

Agent execution is modeled as a transition system: $\mathcal{T} = (X, A, T)$ where $X$ is the global state space, $A$ is the set of primitive actions, and $T: X \times A \to X$ is a transition relation. Constraints can be phrased as invariants over $\mathcal{T}$ , e.g., in LTL: $\mathbf{G}\;\neg\mathit{unsafe}, \;\;\text{where}\;\; \mathit{unsafe}(x,a) = \lnot c_i(\mathit{args}(x,a))$

2. Offline Policy Synthesis and Verification

The offline stage assembles, tests, and formally verifies the agent’s behavioral policy. The process proceeds as follows:

Procedure SYNTHESIZE_POLICY(r,𝒮):
    (p₀, 𝒫₀) ← G₀(r,𝒮)
    C₀        ← H₀(r,𝒮)
    t ← 0
    loop:
        Q ← Vₐ(p_t, C_t)
        if user_feedback_available:
            (A, R) ← V_d(Q, user_feedback)
        else:
            (A, R) ← V_d(Q, Ω(Q))
        Tests ← TEST_GEN(p_t, 𝒮, r)
        e ← run_tests(p_t, Tests)
        if e ≠ ∅:
            (p_{t+1}, C_{t+1}) ← G_{t+1}(r,𝒮,R,A,e, p_t)
            t ← t + 1
            continue
        embed_pre_post_conditions(p_t, C_t)
        result, counterexample ← VERIFY_NAGINI(p_t)
        if result == VERIFIED:
            return (p_t, C_t)
        else:
            (p_{t+1}, C_{t+1}) ← G_{t+1}(r,𝒮,R,A,counterexample, p_t)
            t ← t + 1

Key components include:

$G_t$ : LLM-driven code and schema generator
$H_t$ : LLM-driven constraint generator
$V_a, V_d$ : validator analysis and disambiguation
TEST_GEN: automated PyTest test-case generation
VERIFY_NAGINI: symbolic verifier via Nagini/Viper

Verification relies on Hoare logic contracts: $\{\,C_{\mathrm{pre}}\,\}\;p\;\{\,C_{\mathrm{post}}\,\}$ Nagini generates proof obligations: $\forall\;\pi. \;C_{\mathrm{pre}}(\pi) \Longrightarrow \mathrm{exec}(p, \pi) \models C_{\mathrm{post}}$ Counterexamples from either empirical testing or symbolic proof inform policy refinement. Iteration continues until both the test suite and symbolic verifier succeed.

The framework’s development cycle merges empirical and formal methods:

Testing → Counterexample: Failures encountered during pytest-driven empirical tests produce concrete argument tuples $\pi^*$ violating expected behavior.
Formal Proof → Counterexample: Symbolic verification (Nagini/Viper) may yield logical witnesses showing policy non-conformance to postconditions.
Refinement Loop: These counterexamples are used to update the LLM’s synthesized code and constraints, converging toward a correct-by-construction policy.

Verified success establishes the theorem: $\forall\,\pi \in \mathcal{P}. \left( C_{\mathrm{pre}}(\pi) \implies \mathrm{exec}(p, \pi) \models C_{\mathrm{post}} \right)$ Combining formal guarantees and empirical coverage, VeriGuard ensures policy enforcement strictly matches user intent.

4. Online Runtime Monitoring and Enforcement

During agent deployment, VeriGuard’s runtime monitor validates all agent-proposed actions:

Argument Extraction: Raw agent data is parsed (via LLM extractor) to argument tuple $\pi$ .
Policy Invocation: The verified policy function $p(\pi)$ determines allow or deny.
Enforcement: If allowed, the action executes; if denied, one of Task Termination (TT), Action Blocking (AB), Tool Execution Halt (TEH), or Collaborative Re-planning (CRP) is triggered.

All checks are constant-time Boolean predicate evaluations $\bigwedge_i c_i(\pi)$ , ensuring lightweight operation. Formally, decision logic follows: $\mathrm{decide}(a) = \begin{cases} \texttt{allow}, & \bigwedge_{c \in C} c(f(\mathit{data}(a))) \ \texttt{deny}, & \text{otherwise} \end{cases}$

5. Technical Architecture and Toolchain

The following summarizes the components and data flow:

Component	Role	Technology
LLM Agent	Proposes actions	GPT-4o, Gemini
Argument Extractor	Parses agent data to structured arguments	LLM-based
Policy Function	Encodes allow/deny logic (Python, contracts)	Nagini/Viper
Runtime Monitor	Intercepts/enforces agent actions	Python, Boolean logic
Execution Backend	Performs permitted tool/API actions	Platform-dependent

Verification tooling includes Nagini (static Python verifier using Viper), PyTest (test generation/execution), and LLM APIs for code and constraint synthesis.

6. Benchmark Evaluation and Formal Guarantees

VeriGuard is evaluated against security and access-control benchmarks:

Defense	ASR ↓	TSR ↑
No defense	51.9%	42.1%
GuardRail	0.0%	40.2%
VeriGuard (Flash)	0.0%	63.3%

On ASB (Agent Security Bench), VeriGuard consistently achieves zero attack success rate (ASR) and maximized task success rate (TSR). Access-control tasks (EICU-AC, Mind2Web-SC) exhibit perfect accuracy, precision, and recall under VeriGuard policies (e.g., 100%/100%/100% for EICU-AC).

Method	EICU-AC Acc	EICU-AC P	EICU-AC R	Mind2Web-SC Acc	Mind2Web-SC P	Mind2Web-SC R
GuardAgent (GPT-4)	98.7%	100%	97.5%	90.0%	100%	80.0%
AGrail (GPT-4o)	97.8%	97.5%	98.1%	98.4%	99.0%	98.0%
VeriGuard (GPT-4o)	100%	100%	100%	95.1%	91.3%	99.0%

VeriGuard’s formal methodology guarantees that safety constraints are upheld by construction, in contrast to purely heuristic guardrails. The repeated synthesis-verification cycle drives both attack success rate to zero and preserves or improves utility metrics.

7. Framework Significance and Implications

VeriGuard’s contribution lies in its fusion of formal specification, policy/code synthesis via LLMs, empirical validation, static verification, and constant-time runtime monitoring. This end-to-end approach establishes a trustworthy pipeline for constructing and deploying LLM agents with rigorously enforced safety invariants.

This suggests an emerging paradigm where agent safety is not just empirically observed but formally proven. A plausible implication is the generalization of VeriGuard’s methodology to broader classes of model-driven agents and safety-critical domains, advancing the field toward scalable formal assurance in machine autonomy (Miculicich et al., 3 Oct 2025).

Markdown Upgrade to Chat

References (1)

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to VeriGuard Framework.

VeriGuard: Dual-Stage Safety Framework

1. Formal Safety Specification and Modeling

2. Offline Policy Synthesis and Verification

3. Iterative Refinement and Policy Correctness

4. Online Runtime Monitoring and Enforcement

5. Technical Architecture and Toolchain

6. Benchmark Evaluation and Formal Guarantees

7. Framework Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

VeriGuard: Dual-Stage Safety Framework

1. Formal Safety Specification and Modeling

2. Offline Policy Synthesis and Verification

3. Iterative Refinement and Policy Correctness

4. Online Runtime Monitoring and Enforcement

5. Technical Architecture and Toolchain

6. Benchmark Evaluation and Formal Guarantees

7. Framework Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics