Papers
Topics
Authors
Recent
2000 character limit reached

PSV-Verus: Verified Self-Play Code Synthesis

Updated 23 December 2025
  • PSV-Verus is a self-play algorithm for formally verified code generation that uses the Verus system to ensure Rust programs meet strict correctness criteria.
  • It alternates between proposing new formal specifications, synthesizing solutions, and verifying correctness, with dynamic difficulty adjustments to match solver progress.
  • The approach demonstrates significant performance gains—as seen in benchmarks—by integrating rejection fine-tuning and expert iteration in a data-free curriculum process.

PSV-Verus refers to a self-play training algorithm for formally verified code generation, introduced in "Propose, Solve, Verify: Self-Play Through Formal Verification" (Wilf et al., 20 Dec 2025). PSV-Verus leverages formal verification—specifically the Verus system for Rust programs—to train LLMs on program synthesis tasks in the absence of human-written data. The core cycle alternates between generating new specifications (proposing), synthesizing programs (solving), and filtering via formal correctness (verifying), with difficulty-aware adaptation at every stage.

1. The Propose, Solve, Verify (PSV) Framework

PSV-Verus employs two neural modules: a "proposer" PϕtP_{\phi_t} responsible for generating synthetic formal specifications, and a "solver" SθtS_{\theta_t} trained to synthesize Rust+Verus proofs meeting these specs. A sound formal verifier v(x,y){0,1}v(x, y) \in \{0, 1\} determines if candidate program yy is provably correct with respect to input specification xx. The self-play protocol operates as follows:

  • From a seed set X0X_0 of specs, the solver produces ktrnk_{\mathrm{trn}} solutions for each xXtx \in X_t.
  • Solutions are formally verified; only (x,y)(x, y) pairs passing v(x,y)v(x, y) are retained.
  • The solver is fine-tuned via rejection sampling on verified data.
  • The proposer is conditioned on both new verified specs and their pass rates to propose more challenging or diverse problems.
  • The combined set of specs expands for the next iteration, closing the learning loop.

This self-play cycle is repeated for TT iterations, with the curriculum dynamically adapting as the solver improves.

2. Formal Verification as Supervision

Verus provides SMT-backed formal verification. By design, its verifier is sound: v(x,y)=1v(x, y) = 1 implies yy is provably correct for all permissible inputs of xx within the Verus logic. This contrasts sharply with test-suite–based feedback, which is vulnerable to reward hacking and incomplete behavioral coverage. In PSV-Verus, only programs passing formal verification are used for further training, guaranteeing that the learned model only imitates verified-correct construction strategies.

3. Difficulty-Aware Proposal and Curriculum

PSV-Verus employs difficulty-adaptive curriculum learning. Specifications xix_i are bucketed into four difficulty classes based on pass rate ri,tr_{i,t}: Easy, Medium, Hard, Impossible, using fixed thresholds τE\tau_E and τM\tau_M:

diff(xi,t)={Easyri,tτE MediumτMri,t<τE Hard0<ri,t<τM Impossibleri,t=0\text{diff}(x_i, t)= \begin{cases} \text{Easy} & r_{i,t} \geq \tau_E \ \text{Medium} & \tau_M \leq r_{i,t} < \tau_E \ \text{Hard} & 0 < r_{i,t} < \tau_M \ \text{Impossible} & r_{i,t} = 0 \end{cases}

The proposer conditions on a uniform sample of difficulty buckets when generating new specs, ensuring that the solver is exposed to a spectrum of tractable and challenging problems. This explicit difficulty balancing, paired with dynamic updates as pass rates change, creates an evolving curriculum that closely follows solver capabilities.

4. Expert Iteration via Rejection Fine-tuning

Solver training adopts a form of expert iteration or offline RL with binary reward, termed "rejection fine-tuning." Letting DtD_t^* denote the set of formally verified (x,y)(x, y) pairs at iteration tt:

Dt={(xi,yi,tj)v(xi,yi,tj)=1}D_t^* = \{(x_i, y^j_{i,t}) \mid v(x_i, y^j_{i,t})=1 \}

The solver is trained using cross-entropy loss on this data, updating parameters θ\theta via

Lsolver(θ)=1Dt(x,y)Dtk=1ylogpθ(y[k]x,y<k)\mathcal L_{\mathrm{solver}}(\theta) = -\frac{1}{|D_t^*|} \sum_{(x, y) \in D_t^*} \sum_{k=1}^{|y|} \log p_\theta(y[k] \mid x, y_{<k})

This paradigm prevents the propagation of unverified or spurious solutions, directly aligning model improvements with formal correctness.

5. Model Architecture and Training Protocol

The backbone model for PSV-Verus is Qwen2.5-Coder-3B-Instruct, a 3B-parameter, decoder-only transformer. Solver fine-tuning uses LoRA adapters (lora_r = 16, lora_α = 32), targeting all major projection modules, with bfloat16 precision and gradient checkpointing for efficiency.

  • Solver: Fine-tuned via verified binaries for three epochs at 2×1042 \times 10^{-4} learning rate.
  • Proposer: Frozen weights; updated purely via in-context learning on novel, verified specifications and pass-rate labels.

Spec proposal and solution inference are performed using SGLang with dpdp-size=8 and temperature 0.8.

6. Empirical Results, Scaling, and Ablation

PSV-Verus is evaluated on three Rust+Verus benchmarks: Dafny2Verus, MBPP-Verified, and HumanEval-Verified. The system achieves substantial performance improvements over inference-only or non-self-play baselines, with up to 9.6×9.6\times Pass@1 gains (e.g., 36.8% on MBPP compared to 3.8% for RFT in test-time training).

Table: Pass@1 Results Across Benchmarks

Method Dafny2Verus MBPP HumanEval
AlphaVerus 24.1% 6.5% 7.2%
RFT 34.5% 3.8% 5.6%
PSV-Verus 65.6% 36.8% 19.1%

Scaling analysis indicates logarithmic improvements with questions per iteration and further gains from iterative vs. one-shot training. Ablations identify formal verification supervision as the most critical: removing verification results in over 50% relative performance loss in Pass@1 across benchmarks, while removing difficulty awareness or diversity from the proposer each cause 5–11% drops.

7. Assumptions, Limitations, and Scope

  • Verifier properties: The effectiveness of PSV-Verus hinges on access to a sound (though potentially incomplete) specification verifier, such as Verus. Some correct programs may be rejected, capping coverage, but all accepted solutions are valid.
  • Compute requirements: The protocol is resource intensive, with solving and verifying thousands of specs per round requiring significant parallel compute (∼24 hours on 8× L40 GPUs for core experiments).
  • Spec filtering: A nontrivial fraction of candidate specs are invalid or un-compilable, necessitating robust pre-filtering procedures (e.g., compilation with stubs to ensure syntactic well-formedness).
  • Domain generality: PSV requires a verifiable formal specification language. Extending beyond domains with robust verifiers remains an open area.

A key implication is that this approach generalizes to any code or proof synthesis setting where formally checkable correctness is available; however, the necessity of such a verifier remains the main constraint.

8. Significance and Distinctiveness Within AI for Program Synthesis

PSV-Verus demonstrates that self-play, when mediated by formal verification, is sufficient for training performant code-generation LLMs from scratch—without any recourse to human-written problems or solutions (Wilf et al., 20 Dec 2025). This distinguishes it from reward-hacked self-play for games or unit-test driven code synthesis, which are vulnerable to over-fitting, reward exploitation, or propagation of human errors. By integrating difficulty-aware curricula, explicit verification gating, and rejection-based fine-tuning, PSV-Verus sets a new baseline for data-free curriculum development in verified programming domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to PSV-Verus.