DeepSeek-Prover-V1.5: Lean 4 Theorem Prover

Updated 4 November 2025

The paper introduces a two-stage training framework combining supervised fine-tuning and reinforcement learning from proof assistant feedback to optimize Lean-verified proofs.
It implements a hybrid truncate-and-resume mechanism with reward-maximizing tree search to efficiently explore vast formal proof spaces.
The system achieves state-of-the-art accuracy on benchmarks like miniF2F and ProofNet, outperforming previous models in formal mathematical reasoning.

DeepSeek-Prover-V1.5 is an open-source formal theorem proving LLM specialized for Lean 4, designed to advance both model-centric and search-centric paradigms for automated mathematical reasoning. Its development introduced new supervised and reward-based learning strategies, integration of explicit proof state feedback, and a scalable hybrid proof generation/search algorithm, establishing state-of-the-art performance on diverse formal mathematics benchmarks.

1. Model Architecture and Formalization Focus

DeepSeek-Prover-V1.5 adopts a 7B-parameter LLM backbone (pre-trained as DeepSeekMath-Base) and specializes it for formal mathematical languages. The model is extensively exposed to Lean 4, with auxiliary representation of Isabelle and Metamath. Lean 4 tactic syntax and proof structure are central to its generative capabilities—models parse both global proof plans (whole-pass) and local tactic transitions (interactive, state-annotated) with explicit support for chain-of-thought (CoT) logic, as well as auxiliary tactic state annotation via LeanDojo.

The architecture supports both single-pass whole-proof generation—where an entire proof script is generated up-front—and interactive, stepwise generation, wherein proofs are constructed incrementally with intermediate verification and resumption informed by tactic state annotations.

2. Training Methodology: SFT and Reinforcement from Proof Assistant Feedback

Training employs a two-stage approach:

Supervised Fine-Tuning (SFT):
- The model is fine-tuned on an expanded corpus (~9.6M sequences) built on expert iteration over Mathlib4, Lean Workbook, synthetic DeepSeek-Prover-V1 data, and miniF2F/ProofNet problems.
- Chain-of-thought comments generated via DeepSeek-Coder V2 236B are interleaved within Lean proofs, enforcing explicit formal reasoning steps.
- Tactic state annotations (from Lean REPL) are appended as block comments and used as auxiliary supervision, enabling the truncate-and-resume protocol within tree search and stepwise inference.
Reinforcement Learning from Proof Assistant Feedback (RLPAF):
- Group Relative Policy Optimization (GRPO) is utilized, sampling 32 candidate proofs per prompt; reward is assigned as 1 for Lean-verified proofs, otherwise 0.
- Objective: Optimize the relative binary reward signal, aligning model policy toward search-space regions reliably passing Lean 4 verification.
- RL is applied across a curated set of ~4.5k theorems, leveraging CoT-enriched prompts and exploiting the model’s tactic-state prediction capabilities.

3. Truncate-and-Resume Mechanism and RMaxTS Search

The "truncate-and-resume" mechanism is a hybrid proof generation protocol, enabling correction and extension of partial proofs:

The model generates a complete proof, which is submitted to the Lean verifier.
On verification error, the proof is truncated at the first failed tactic; successful prefix and current tactic state are used to prompt the model for continuation.
This iterative resumption allows error recovery and deeper search.

RMaxTS (Reward-Maximizing Tree Search) is a variant of Monte-Carlo Tree Search adapted for the sparse rewards typical in formal theorem proving:

Intrinsic reward: New tree node expansion in the proof search tree serves as a proxy for extrinsic reward, allowing non-trivial exploration.
Discounted UCB: Non-stationary confidence bounds guide selection, weighting recent successes higher and mitigating local plateaus.
Highly parallelized runners and asynchronous Lean calls support scalable search over vast proof spaces.

4. Datasets, Formal Language Specialization, and Data Augmentations

Pre-training utilizes DeepSeekMath-Base (mathematical code, Lean, Isabelle, Metamath).
Fine-tuning leverages Mathlib4, synthetic theorems, expert-iteration data, and benchmark-driven problems.
All proofs are enriched with chain-of-thought annotations and intermediate tactic state comments, supporting both training and inference workflows.
Expert iteration cycles expand coverage and resilience by systematically retraining on new Lean-confirmed proofs.

5. Benchmark Evaluation and Performance Gains

DeepSeek-Prover-V1.5 achieves state-of-the-art accuracy:

miniF2F-test (High school Olympiad)
- Single-pass: 60.2%
- RMaxTS (Tree search): 63.5%
- Previous SOTA (DSP-V1, InternLM2-StepProver): 50.0%, 54.5%
ProofNet-test (Undergraduate)
- Single-pass: 23.7%
- RMaxTS: 25.3%
- Previous SOTA: ReProver 13.8%, InternLM2-StepProver 18.1%

The integration of RLPAF and search-centric inference delivers orthogonal improvements: tree search increases sample efficiency and robustness to error, while RL-trained models align generated proofs to verifiable Lean standards, especially for distribution-shifted or "hard" proof problems.

6. Practical Implications, Model Selection, and Technical Significance

DeepSeek-Prover-V1.5’s design enables robust and versatile Lean formalization workflows:

Unified Model & Search Interface: The same model supports both fast, single-pass generation (for easy theorems) and iterative, correction-rich search (for harder theorems or ambiguous spaces).
Scalable Formal Reasoning: Tree search and state annotation strategies support efficient exploration of large proof spaces, essential for competition-level and university-level problems.
Reproducible Formalization: Direct Lean 4 feedback loop guarantees solution correctness, offering a practical foundation for integrating LLMs with interactive theorem proving.
Model Selection Guidance: Provides actionable insights for real-world deployment, with performance tiers across logical reasoning, planning, and text-centric tasks as detailed in (Zhao et al., 16 Feb 2025).

DeepSeek-Prover-V1.5 advances over DeepSeek-Prover-V1 principally through the inclusion of RLPAF, thought-augmented data, and search-centric methodologies. However, subsequent models such as Goedel-Prover (Lin et al., 11 Feb 2025), Leanabell-Prover (Zhang et al., 8 Apr 2025), and MA-LoT (Wang et al., 5 Mar 2025) demonstrate further gains, exploiting even larger datasets, multi-agent chain-of-thought reasoning, and diverse formalization styles.

Limitations:

Reasoning enhancements may reduce performance on text-centric tasks; users should consult benchmarking tables and model selection charts (Zhao et al., 16 Feb 2025).
Sample efficiency improves with hybrid search and RL alignment but may be limited for certain problem distributions.
For tasks requiring nuanced text understanding, entity extraction, or named entity recognition, alternative model families (Qwen, Llama) or instruction-tuned baselines may be preferable for cost-sensitive deployment.

In summary, DeepSeek-Prover-V1.5 embodies a modern, reward-aligned, and search-integrated approach for Lean 4 formal theorem proving, establishing strong new benchmarks in accuracy and proof space exploration efficiency. Its architecture and training regimen serve as a reference for subsequent advancements in automated formal mathematical reasoning.