Papers
Topics
Authors
Recent
2000 character limit reached

DeepSeek-Prover-V1.5: Lean 4 Theorem Prover

Updated 4 November 2025
  • The paper introduces a two-stage training framework combining supervised fine-tuning and reinforcement learning from proof assistant feedback to optimize Lean-verified proofs.
  • It implements a hybrid truncate-and-resume mechanism with reward-maximizing tree search to efficiently explore vast formal proof spaces.
  • The system achieves state-of-the-art accuracy on benchmarks like miniF2F and ProofNet, outperforming previous models in formal mathematical reasoning.

DeepSeek-Prover-V1.5 is an open-source formal theorem proving LLM specialized for Lean 4, designed to advance both model-centric and search-centric paradigms for automated mathematical reasoning. Its development introduced new supervised and reward-based learning strategies, integration of explicit proof state feedback, and a scalable hybrid proof generation/search algorithm, establishing state-of-the-art performance on diverse formal mathematics benchmarks.

1. Model Architecture and Formalization Focus

DeepSeek-Prover-V1.5 adopts a 7B-parameter LLM backbone (pre-trained as DeepSeekMath-Base) and specializes it for formal mathematical languages. The model is extensively exposed to Lean 4, with auxiliary representation of Isabelle and Metamath. Lean 4 tactic syntax and proof structure are central to its generative capabilities—models parse both global proof plans (whole-pass) and local tactic transitions (interactive, state-annotated) with explicit support for chain-of-thought (CoT) logic, as well as auxiliary tactic state annotation via LeanDojo.

The architecture supports both single-pass whole-proof generation—where an entire proof script is generated up-front—and interactive, stepwise generation, wherein proofs are constructed incrementally with intermediate verification and resumption informed by tactic state annotations.

2. Training Methodology: SFT and Reinforcement from Proof Assistant Feedback

Training employs a two-stage approach:

  • Supervised Fine-Tuning (SFT):
    • The model is fine-tuned on an expanded corpus (~9.6M sequences) built on expert iteration over Mathlib4, Lean Workbook, synthetic DeepSeek-Prover-V1 data, and miniF2F/ProofNet problems.
    • Chain-of-thought comments generated via DeepSeek-Coder V2 236B are interleaved within Lean proofs, enforcing explicit formal reasoning steps.
    • Tactic state annotations (from Lean REPL) are appended as block comments and used as auxiliary supervision, enabling the truncate-and-resume protocol within tree search and stepwise inference.
  • Reinforcement Learning from Proof Assistant Feedback (RLPAF):
    • Group Relative Policy Optimization (GRPO) is utilized, sampling 32 candidate proofs per prompt; reward is assigned as 1 for Lean-verified proofs, otherwise 0.
    • Objective: Optimize the relative binary reward signal, aligning model policy toward search-space regions reliably passing Lean 4 verification.
    • RL is applied across a curated set of ~4.5k theorems, leveraging CoT-enriched prompts and exploiting the model’s tactic-state prediction capabilities.

The "truncate-and-resume" mechanism is a hybrid proof generation protocol, enabling correction and extension of partial proofs:

  • The model generates a complete proof, which is submitted to the Lean verifier.
  • On verification error, the proof is truncated at the first failed tactic; successful prefix and current tactic state are used to prompt the model for continuation.
  • This iterative resumption allows error recovery and deeper search.

RMaxTS (Reward-Maximizing Tree Search) is a variant of Monte-Carlo Tree Search adapted for the sparse rewards typical in formal theorem proving:

  • Intrinsic reward: New tree node expansion in the proof search tree serves as a proxy for extrinsic reward, allowing non-trivial exploration.
  • Discounted UCB: Non-stationary confidence bounds guide selection, weighting recent successes higher and mitigating local plateaus.
  • Highly parallelized runners and asynchronous Lean calls support scalable search over vast proof spaces.

4. Datasets, Formal Language Specialization, and Data Augmentations

  • Pre-training utilizes DeepSeekMath-Base (mathematical code, Lean, Isabelle, Metamath).
  • Fine-tuning leverages Mathlib4, synthetic theorems, expert-iteration data, and benchmark-driven problems.
  • All proofs are enriched with chain-of-thought annotations and intermediate tactic state comments, supporting both training and inference workflows.
  • Expert iteration cycles expand coverage and resilience by systematically retraining on new Lean-confirmed proofs.

5. Benchmark Evaluation and Performance Gains

DeepSeek-Prover-V1.5 achieves state-of-the-art accuracy:

  • miniF2F-test (High school Olympiad)
    • Single-pass: 60.2%
    • RMaxTS (Tree search): 63.5%
    • Previous SOTA (DSP-V1, InternLM2-StepProver): 50.0%, 54.5%
  • ProofNet-test (Undergraduate)
    • Single-pass: 23.7%
    • RMaxTS: 25.3%
    • Previous SOTA: ReProver 13.8%, InternLM2-StepProver 18.1%

The integration of RLPAF and search-centric inference delivers orthogonal improvements: tree search increases sample efficiency and robustness to error, while RL-trained models align generated proofs to verifiable Lean standards, especially for distribution-shifted or "hard" proof problems.

6. Practical Implications, Model Selection, and Technical Significance

DeepSeek-Prover-V1.5’s design enables robust and versatile Lean formalization workflows:

  • Unified Model & Search Interface: The same model supports both fast, single-pass generation (for easy theorems) and iterative, correction-rich search (for harder theorems or ambiguous spaces).
  • Scalable Formal Reasoning: Tree search and state annotation strategies support efficient exploration of large proof spaces, essential for competition-level and university-level problems.
  • Reproducible Formalization: Direct Lean 4 feedback loop guarantees solution correctness, offering a practical foundation for integrating LLMs with interactive theorem proving.
  • Model Selection Guidance: Provides actionable insights for real-world deployment, with performance tiers across logical reasoning, planning, and text-centric tasks as detailed in (Zhao et al., 16 Feb 2025).

DeepSeek-Prover-V1.5 advances over DeepSeek-Prover-V1 principally through the inclusion of RLPAF, thought-augmented data, and search-centric methodologies. However, subsequent models such as Goedel-Prover (Lin et al., 11 Feb 2025), Leanabell-Prover (Zhang et al., 8 Apr 2025), and MA-LoT (Wang et al., 5 Mar 2025) demonstrate further gains, exploiting even larger datasets, multi-agent chain-of-thought reasoning, and diverse formalization styles.

Limitations:

  • Reasoning enhancements may reduce performance on text-centric tasks; users should consult benchmarking tables and model selection charts (Zhao et al., 16 Feb 2025).
  • Sample efficiency improves with hybrid search and RL alignment but may be limited for certain problem distributions.
  • For tasks requiring nuanced text understanding, entity extraction, or named entity recognition, alternative model families (Qwen, Llama) or instruction-tuned baselines may be preferable for cost-sensitive deployment.

In summary, DeepSeek-Prover-V1.5 embodies a modern, reward-aligned, and search-integrated approach for Lean 4 formal theorem proving, establishing strong new benchmarks in accuracy and proof space exploration efficiency. Its architecture and training regimen serve as a reference for subsequent advancements in automated formal mathematical reasoning.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to DeepSeek-Prover-V1.5.