PSV-Verus: Verified Self-Play Code Synthesis

Updated 23 December 2025

PSV-Verus is a self-play algorithm for formally verified code generation that uses the Verus system to ensure Rust programs meet strict correctness criteria.
It alternates between proposing new formal specifications, synthesizing solutions, and verifying correctness, with dynamic difficulty adjustments to match solver progress.
The approach demonstrates significant performance gains—as seen in benchmarks—by integrating rejection fine-tuning and expert iteration in a data-free curriculum process.

PSV-Verus refers to a self-play training algorithm for formally verified code generation, introduced in "Propose, Solve, Verify: Self-Play Through Formal Verification" (Wilf et al., 20 Dec 2025). PSV-Verus leverages formal verification—specifically the Verus system for Rust programs—to train LLMs on program synthesis tasks in the absence of human-written data. The core cycle alternates between generating new specifications (proposing), synthesizing programs (solving), and filtering via formal correctness (verifying), with difficulty-aware adaptation at every stage.

1. The Propose, Solve, Verify (PSV) Framework

PSV-Verus employs two neural modules: a "proposer" $P_{\phi_t}$ responsible for generating synthetic formal specifications, and a "solver" $S_{\theta_t}$ trained to synthesize Rust+Verus proofs meeting these specs. A sound formal verifier $v(x, y) \in \{0, 1\}$ determines if candidate program $y$ is provably correct with respect to input specification $x$ . The self-play protocol operates as follows:

From a seed set $X_0$ of specs, the solver produces $k_{\mathrm{trn}}$ solutions for each $x \in X_t$ .
Solutions are formally verified; only $(x, y)$ pairs passing $v(x, y)$ are retained.
The solver is fine-tuned via rejection sampling on verified data.
The proposer is conditioned on both new verified specs and their pass rates to propose more challenging or diverse problems.
The combined set of specs expands for the next iteration, closing the learning loop.

This self-play cycle is repeated for $T$ iterations, with the curriculum dynamically adapting as the solver improves.

2. Formal Verification as Supervision

Verus provides SMT-backed formal verification. By design, its verifier is sound: $v(x, y) = 1$ implies $y$ is provably correct for all permissible inputs of $x$ within the Verus logic. This contrasts sharply with test-suite–based feedback, which is vulnerable to reward hacking and incomplete behavioral coverage. In PSV-Verus, only programs passing formal verification are used for further training, guaranteeing that the learned model only imitates verified-correct construction strategies.

3. Difficulty-Aware Proposal and Curriculum

PSV-Verus employs difficulty-adaptive curriculum learning. Specifications $x_i$ are bucketed into four difficulty classes based on pass rate $r_{i,t}$ : Easy, Medium, Hard, Impossible, using fixed thresholds $\tau_E$ and $\tau_M$ :

$\text{diff}(x_i, t)= \begin{cases} \text{Easy} & r_{i,t} \geq \tau_E \ \text{Medium} & \tau_M \leq r_{i,t} < \tau_E \ \text{Hard} & 0 < r_{i,t} < \tau_M \ \text{Impossible} & r_{i,t} = 0 \end{cases}$

The proposer conditions on a uniform sample of difficulty buckets when generating new specs, ensuring that the solver is exposed to a spectrum of tractable and challenging problems. This explicit difficulty balancing, paired with dynamic updates as pass rates change, creates an evolving curriculum that closely follows solver capabilities.

4. Expert Iteration via Rejection Fine-tuning

Solver training adopts a form of expert iteration or offline RL with binary reward, termed "rejection fine-tuning." Letting $D_t^*$ denote the set of formally verified $(x, y)$ pairs at iteration $t$ :

$D_t^* = \{(x_i, y^j_{i,t}) \mid v(x_i, y^j_{i,t})=1 \}$

The solver is trained using cross-entropy loss on this data, updating parameters $\theta$ via

$\mathcal L_{\mathrm{solver}}(\theta) = -\frac{1}{|D_t^*|} \sum_{(x, y) \in D_t^*} \sum_{k=1}^{|y|} \log p_\theta(y[k] \mid x, y_{<k})$

This paradigm prevents the propagation of unverified or spurious solutions, directly aligning model improvements with formal correctness.

5. Model Architecture and Training Protocol

The backbone model for PSV-Verus is Qwen2.5-Coder-3B-Instruct, a 3B-parameter, decoder-only transformer. Solver fine-tuning uses LoRA adapters (lora_r = 16, lora_α = 32), targeting all major projection modules, with bfloat16 precision and gradient checkpointing for efficiency.

Solver: Fine-tuned via verified binaries for three epochs at $2 \times 10^{-4}$ learning rate.
Proposer: Frozen weights; updated purely via in-context learning on novel, verified specifications and pass-rate labels.

Spec proposal and solution inference are performed using SGLang with $dp$ -size=8 and temperature 0.8.

6. Empirical Results, Scaling, and Ablation

PSV-Verus is evaluated on three Rust+Verus benchmarks: Dafny2Verus, MBPP-Verified, and HumanEval-Verified. The system achieves substantial performance improvements over inference-only or non-self-play baselines, with up to $9.6\times$ Pass@1 gains (e.g., 36.8% on MBPP compared to 3.8% for RFT in test-time training).

Table: Pass@1 Results Across Benchmarks

Method	Dafny2Verus	MBPP	HumanEval
AlphaVerus	24.1%	6.5%	7.2%
RFT	34.5%	3.8%	5.6%
PSV-Verus	65.6%	36.8%	19.1%

Scaling analysis indicates logarithmic improvements with questions per iteration and further gains from iterative vs. one-shot training. Ablations identify formal verification supervision as the most critical: removing verification results in over 50% relative performance loss in Pass@1 across benchmarks, while removing difficulty awareness or diversity from the proposer each cause 5–11% drops.

7. Assumptions, Limitations, and Scope

Verifier properties: The effectiveness of PSV-Verus hinges on access to a sound (though potentially incomplete) specification verifier, such as Verus. Some correct programs may be rejected, capping coverage, but all accepted solutions are valid.
Compute requirements: The protocol is resource intensive, with solving and verifying thousands of specs per round requiring significant parallel compute (∼24 hours on 8× L40 GPUs for core experiments).
Spec filtering: A nontrivial fraction of candidate specs are invalid or un-compilable, necessitating robust pre-filtering procedures (e.g., compilation with stubs to ensure syntactic well-formedness).
Domain generality: PSV requires a verifiable formal specification language. Extending beyond domains with robust verifiers remains an open area.

A key implication is that this approach generalizes to any code or proof synthesis setting where formally checkable correctness is available; however, the necessity of such a verifier remains the main constraint.

8. Significance and Distinctiveness Within AI for Program Synthesis

PSV-Verus demonstrates that self-play, when mediated by formal verification, is sufficient for training performant code-generation LLMs from scratch—without any recourse to human-written problems or solutions (Wilf et al., 20 Dec 2025). This distinguishes it from reward-hacked self-play for games or unit-test driven code synthesis, which are vulnerable to over-fitting, reward exploitation, or propagation of human errors. By integrating difficulty-aware curricula, explicit verification gating, and rejection-based fine-tuning, PSV-Verus sets a new baseline for data-free curriculum development in verified programming domains.

Markdown Upgrade to Chat

References (1)

Propose, Solve, Verify: Self-Play Through Formal Verification (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PSV-Verus.

PSV-Verus: Verified Self-Play Code Synthesis

1. The Propose, Solve, Verify (PSV) Framework

2. Formal Verification as Supervision

3. Difficulty-Aware Proposal and Curriculum

4. Expert Iteration via Rejection Fine-tuning

5. Model Architecture and Training Protocol

6. Empirical Results, Scaling, and Ablation

Table: Pass@1 Results Across Benchmarks

7. Assumptions, Limitations, and Scope

8. Significance and Distinctiveness Within AI for Program Synthesis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

PSV-Verus: Verified Self-Play Code Synthesis

1. The Propose, Solve, Verify (PSV) Framework

2. Formal Verification as Supervision

3. Difficulty-Aware Proposal and Curriculum

4. Expert Iteration via Rejection Fine-tuning

5. Model Architecture and Training Protocol

6. Empirical Results, Scaling, and Ablation

Table: Pass@1 Results Across Benchmarks

7. Assumptions, Limitations, and Scope

8. Significance and Distinctiveness Within AI for Program Synthesis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research