Propose–Solve–Verify (PSV) Paradigm

Updated 23 December 2025

Propose–Solve–Verify (PSV) is a structured paradigm that splits problem-solving into candidate proposal, solution execution, and formal verification stages.
It integrates formal methods, LLM-driven tactic generation, and neural-symbolic reasoning to ensure process-level verifiability and transparency.
PSV frameworks, including FPS, SDV, and neural variants, advance benchmark performance in theorem proving, code synthesis, and geometry reasoning.

The Propose–Solve–Verify (PSV) paradigm delineates a structured approach for automating formal and mathematical problem solving in artificial intelligence. PSV decomposes the task into three sequential stages—proposing candidate actions or solution steps, executing those steps to construct an answer, and verifying the correctness of the answer through rigorous formal or symbolic checks. This methodology has been instantiated across proof assistants, generative LLMs, self-play in code synthesis, and neural-symbolic reasoning systems. Its core motivation is enabling end-to-end process-level verifiability as opposed to mere outcome-based evaluation, thereby elevating both the reliability and interpretability of AI-based problem-solving systems (Liu et al., 7 May 2025, Wilf et al., 20 Dec 2025, Zhong et al., 17 May 2025, Zhang et al., 10 Jul 2024).

1. Conceptual Framework and Formalization

Recent research formalizes problem solving within PSV as a deterministic Markov Decision Process (MDP), particularly in the context of theorem-proving environments such as Lean 4 (Liu et al., 7 May 2025). In this setting, a problem $P(\hat a) = \left(\forall_{i=1}^n v_i,\; \bigwedge_{i=1}^p\phi_i\; \rightarrow\; \bigwedge_{i=1}^q\psi_i\right)[a\mapsto\hat a]$ is encoded with independent variables, hypotheses, and conclusions. The MDP's state space $S$ comprises pairs of unassigned metavariables ("holes") $H$ and unproven goals $G$ , while the action space $A$ consists of solution steps, typically the application of formal tactics.

Transitions encode tactic application and are kernel-checked to ensure type and logic preservation. The reward function is sparse, granting a reward only when all holes are filled and all goals are proved—i.e., when both $H$ and $G$ are empty. This architecture supports policy learning, search heuristics, and step ranking, allowing flexible drop-in of LLM-driven or search-based agents for state exploration.

2. PSV Variants and Extensions

FPS and D-FPS Frameworks

"Formal Problem-Solving" (FPS) operationalizes PSV directly in Lean 4 by leveraging its tactic engine for both proposing and executing solution steps, with verification embedded by construction. No custom loss functions are required; any existing proof-search objective (MCTS, imitation learning) is applicable. The soundness theorem for FPS guarantees that any answer extracted from a terminal state satisfies the original problem.

Deductive FPS (D-FPS) addresses "find-all" declarative problems by decoupling the solution process: it enforces the solution type $\mathsf{Prop}$ , splits the proof into forward (deductive discovery of candidate) and backward (confirmation of equivalence) subtasks, and demonstrates expressiveness, completeness, and soundness. D-FPS, while more transparent and human-aligned, typically achieves lower empirical solve rates (Liu et al., 7 May 2025).

SDV and FlexiVe: Adaptive Verification

Solve–Detect–Verify (SDV) generalizes PSV for inference-time scaling in LLM-based reasoning. SDV introduces an intermediate "Detect" stage—an on-the-fly monitor to proactively curb excessive reasoning and allocate verification compute only when necessary. The "FlexiVe" verifier alternates between fast, resource-efficient error checking, and slow, detailed error diagnosis, using agreement-based escalation to balance accuracy against computational costs (Zhong et al., 17 May 2025).

PSV Self-Play in Code Synthesis

PSV enables self-play training for code generation under SMT-backed formal verification. Here, a proposer synthesizes new, difficulty-targeted specifications based on solver pass-rates; a solver generates candidate programs; and a verifier ensures correctness by semantic validity. Iterating this cycle, without human supervision, substantially improves model generalization and reliability, provided that verification and difficulty awareness are maintained (Wilf et al., 20 Dec 2025).

Neural-Symbolic PSV in Geometry Reasoning

PGPSNet-v2 instantiates PSV in plane geometry: modal fusion of diagram and problem text (Propose), explicable theorem-grammar solution programs (Solve), and multi-level theorem verification (Verify)—addressing both multi-modal and knowledge-driven reasoning (Zhang et al., 10 Jul 2024).

3. Methods for Proposal, Solving, and Verification

Proposal Mechanisms

LLM-based Tactic Generation: LLM or heuristic-driven policies propose the next action given the proof or solution state (Liu et al., 7 May 2025).
In-Context Difficulty Conditioning: The proposer LLM is prompted with exemplars across labeled difficulty ranges to generate targeted new specs for training or challenge generation (Wilf et al., 20 Dec 2025).
Modal Fusion in Geometry: Structured and semantic clauses parsed from data and diagrams are fused with textual problem statements via pretrained encoders (Zhang et al., 10 Jul 2024).

Solving Strategies

Formal Tactic Application: Each solution step is kernel-checked and transitions the MDP state (Liu et al., 7 May 2025).
Autoregressive Program Synthesis: Sequence generation, constrained by self-limited vocabularies or grammars, ensures interpretable and explicable reasoning (Zhang et al., 10 Jul 2024).
Candidate Sampling and Retrying: SDV and PSV frameworks explore multiple traces or program candidates, with verification feedback enabling one-shot or majority-based refinement (Zhong et al., 17 May 2025, Wilf et al., 20 Dec 2025).

Verification Approaches

Kernel-Embedded: In FPS, Lean's kernel verification is integral to every step (Liu et al., 7 May 2025).
Restricted Propositional Equivalence (RPE): Symbolic, tactic-limited checks establish formal correctness relative to ground truth answers, attaining near-perfect precision and recall (Liu et al., 7 May 2025).
SMT-backed and Semantic Checks: In code generation, assertion of program correctness relies on sound SMT-based checks (Verus) (Wilf et al., 20 Dec 2025).
Multi-level Theorem Verification: Geometry solvers apply form, calculability, and semantic checks to candidate solution programs (Zhang et al., 10 Jul 2024).
Dual-Mode Generative Verification: FlexiVe adapts between rapid error prediction and detailed analysis as required, modulated by agreement thresholds (Zhong et al., 17 May 2025).

4. Benchmark Datasets and Empirical Performance

PSV and its derivatives have spawned new benchmarks formalizing end-to-end process-verified human mathematics:

Benchmark	Description	Top PSV Solve Rate
FormalMath500	Algebra/NT/Precalculus, formalized from MATH500	23.77 %
MiniF2F-Solving	High-school-level tasks from MiniF2F	27.47 %
PutnamBench-Solving	Putnam Q's as "find all"/compute tasks	0.31 %

In code synthesis, PSV-Verus demonstrates significant gains over prompting and expert iteration, with Pass@1 on Dafny2Verus jumping from 24.06–34.46 % (baselines) to 65.63 %; on MBPP from 6.48–3.83 % (baselines) to 36.78 %, and similarly strong improvements on HumanEval (Wilf et al., 20 Dec 2025). SDV/FlexiVe yields +16 points improvement in math reasoning accuracy compared to self-consistency for comparable inference compute (Zhong et al., 17 May 2025). PGPSNet-v2 outperforms symbolic and neural geometry solvers while maintaining full solution trace explainability (Zhang et al., 10 Jul 2024).

The empirical gap between process-verified solving and simple answer soundness highlights the complexity added by hole management, explicit solution derivation, and intermediate variable discovery—especially for LLMs under coupled metavariable regimes (Liu et al., 7 May 2025).

5. Theoretical Properties and Failure Modes

Soundness of FPS and PSV frameworks is formally established: any solution extracted from a terminal proof state or verified output is guaranteed to satisfy the specification (Liu et al., 7 May 2025, Wilf et al., 20 Dec 2025). RPE achieves 100 % precision, 97.2 % recall versus human judgement, with Cohen's κ=0.97, demonstrating high alignment with human correctness standards (Liu et al., 7 May 2025).

Nonetheless, limitations persist:

LLMs fail at correct hole filling and unknown variable management under coupled PSV protocols.
Pure proof search models omit the explicit "unknown discovery" necessary for full answer derivation.
In SDV, multi-stage orchestration (detector subcalls, verifier rollouts) introduces compute overhead.
D-FPS, while highly human-aligned in answer equivalence, solves fewer problems than coupled FPS methods.
Complexity of formal verification restricts scalability; incomplete verification, due to Rice's theorem, may lead to rejection of correct solutions.

6. Applications and Implications

PSV frameworks provide the foundation for robust process-level verification across multiple domains:

Mathematical reasoning and competitive problem solving
Verified code generation and formal specification satisfaction
Geometry reasoning with multi-modal data fusion
Adaptive, RL-trained inference-time verification in LLMs

A plausible implication is that PSV methodologies, coupled with symbolic or kernel-backed verification, will become central to future research in interpretable, auditable AI systems. The explicit separation of propose, solve, and verify enables modular improvements and transparent tracking of reasoning failures, supporting both benchmarking and iterative refinement. The persistent gap between formal theorem proving and full process-verified problem solving suggests fertile ground for research in unknown discovery, answer management, and hybrid search strategies. The extension to domains beyond mathematics (e.g., programming, commonsense reasoning, multi-modal tasks) remains an open but promising direction (Liu et al., 7 May 2025, Wilf et al., 20 Dec 2025, Zhong et al., 17 May 2025, Zhang et al., 10 Jul 2024).