Papers
Topics
Authors
Recent
2000 character limit reached

DeepSeekMath-V2: Self-Verified Math Proofs

Updated 1 December 2025
  • DeepSeekMath-V2 is a large language model that achieves self-verifiable mathematical reasoning by interleaving proof generation with learned verification.
  • It leverages a deep transformer with a 128,000-token context and block-sparse attention to effectively process long, multi-stage proofs.
  • The system sets a new standard on advanced benchmarks, demonstrating exceptional performance in open-ended mathematical problem solving.

DeepSeekMath-V2 is a LLM system targeting self-verifiable mathematical reasoning, with a focus on rigorous step-by-step theorem proving and proof verification. Building upon the limitations of final-answer-centric reward mechanisms, DeepSeekMath-V2 interleaves the training and deployment of both a proof generator and a learned verifier, culminating in a self-improving closed-loop pipeline. The system demonstrates exceptional performance on advanced mathematical benchmarks, establishing a new standard for machine reasoning in the context of open-ended mathematical problem solving (Shao et al., 27 Nov 2025).

1. Model Architecture

Both the proof generator and theorem verifier in DeepSeekMath-V2 share an identical backbone architecture based on the “DeepSeek-V3.2-Exp-Base” deep transformer model. Key features include a 128,000-token context window employing block-sparse attention, enabling the processing of long mathematical proofs and multi-stage reasoning.

Essential architectural details such as the number of layers, model width, and attention head count remain proprietary, but both generator and verifier operate at a scale consistent with current large autoregressive LLMs, each fine-tuned for its specialized task. The system introduces the following notation:

  • XX: problem statement
  • YY: candidate proof
  • VV: verifier's analysis (identification of issues and a summary score)
  • Iv\mathcal{I}_v: high-level rubric specifying verification requirements
  • Imv\mathcal{I}_{mv}: meta-verification rubric for verifying the verifier's analysis

The generator and verifier are thus tightly coupled, with their outputs and analyses linking procedural and meta-analytical layers throughout the training and evaluation processes.

2. Training Objectives and Loss Formulations

2.1 Theorem Verifier

Proof verification is formulated as a reinforcement learning (RL) task with a composite reward function:

  • Format Reward:

Rformat(V)={1if V contains the required issue summary and “” score format 0otherwiseR_{\mathrm{format}}(V) = \begin{cases} 1 & \text{if } V \text{ contains the required issue summary and “}\boxed{\cdot}\text{” score format} \ 0 & \text{otherwise} \end{cases}

  • Score Reward:

Rscore(s,s)=1ss,s,s{0,0.5,1}R_{\mathrm{score}}(s',s) = 1 - |s' - s|,\quad s',s \in \{0,0.5,1\}

The verifier's RL objective is: maxπ  E(X,Y,s)Dv,(V,s)π(X,Y)[Rformat(V)Rscore(s,s)]\max_{\pi}\; \mathbb{E}_{(X,Y,s)\sim\mathcal{D}_v,\,(V,s')\sim\pi(\cdot|X,Y)} [R_{\mathrm{format}}(V)\cdot R_{\mathrm{score}}(s',s)] Meta-verification introduces an additional reward RmetaR_{\mathrm{meta}} evaluating the faithfulness of the listed issues. The total reward becomes: RV=Rformat(V)×Rscore(s,s)×RmetaR_V = R_{\mathrm{format}}(V)\times R_{\mathrm{score}}(s',s)\times R_{\mathrm{meta}} A separate meta-verifier is trained with the same structure on meta-verification tasks.

2.2 Proof Generator

The proof generator is optimized by RL, directly using the learned verifier as its reward model. The generator samples a proof YY and, uniquely, is incentivized to conduct self-evaluation by producing its own analysis ZZ with self-assigned score ss': R=Rformat(Y,Z)×[αRY+βRZ],α=0.76,  β=0.24R = R_{\mathrm{format}}(Y,Z) \times [\alpha R_Y + \beta R_Z],\quad \alpha=0.76,\;\beta=0.24 where

RY=s,RZ=Rscore(s,s)×Rmeta(Z)R_Y = s, \quad R_Z = R_{\mathrm{score}}(s',s) \times R_{\mathrm{meta}}(Z)

This blended reward explicitly balances the generator's solution quality and its self-verification fidelity as judged by the verifier and meta-verifier.

3. Self-Verification and Automated Labeling Pipeline

A closed-loop, self-verification protocol orchestrates inference and ongoing improvement. During candidate proof evaluation, the following algorithmic sequence is enacted:

  1. Generate nn independent verifications {V(i)}\{V^{(i)}\} for candidate proof YY.
  2. For verifications with score <1<1, invoke mm meta-verifications M(i,j)M^{(i,j)}.
  3. Validate V(i)V^{(i)} if the majority of {M(i,j)}j\{M^{(i,j)}\}_j confirm its conclusions.
  4. Assign the lowest valid score smins_{\min} across all valid V(i)V^{(i)} as the label for YY, if at least kk such verifications agree.
  5. If no issues are found in any verification, assign label 1; otherwise, defer labeling to human annotators.

This pipeline supports two critical functions:

  • Autonomous generation of high-quality training data for the verifier, especially on newly encountered, challenging proofs.
  • High-compute search at test time, incorporating repeated candidate generation, multi-threaded verification, and iterative refinement of proofs until consensus is achieved.

4. Empirical Performance and Benchmark Results

DeepSeekMath-V2 has been evaluated on both in-house and public competition datasets. The salient results are:

Competition Problems Solved Score
IMO 2025 (6) 5 fully, 0 partial 83.3%
CMO 2024 (6) 4 fully, 1 partial 73.8%
Putnam 2024 (12) 11 fully, 1 partial 98.3%

On a 91-problem CNML-level internal benchmark, DeepSeekMath-V2 outperforms other leading LLMs (GPT-5-Thinking-High, Gemini 2.5-Pro) in algebra, geometry, number theory, combinatorics, and inequalities, with correctness adjudicated by majority over 8 verifier analyses.

Ablation studies varying the number of self-verification loops (from 1 to 8) reveal monotonic improvements in Pass@1 and Best@32, indicating the tangible benefit of iterative refinement for proof quality.

Expert grading confirms model performance exceeds gold-medal thresholds and that the Putnam 2024 score (98.3%, 118/120) surpasses all known human efforts for that exam. Formal significance testing is not reported.

5. Key Insights and Theoretical Implications

Empirical findings indicate that self-verification substantially narrows the gap between generation and verification: compelling the generator to audit and rectify its own proofs before external evaluation leads to demonstrably stronger final solutions than simply optimizing for end answers. The inclusion of a meta-verifier—tasked with verifying the verifier—proves crucial for adopting faithful, non-hallucinatory issue discovery. Automated labeling via scaled multi-stage verification ensures scalability, drastically reducing reliance on manual annotation for challenging tasks.

6. Limitations and Open Challenges

Despite these advances, DeepSeekMath-V2 exhibits limitations:

  • Top-level IMO-hard and advanced-ProofBench problems remain challenging, suggesting the verifier and generator insufficiently capture intricate logical nuances.
  • No formal integration with automated proof assistants such as Lean or Isabelle is provided, meaning the pipeline remains semi-informal; bridging this with formal proof systems is recognized as an important next direction.
  • Model architecture specifications (size, depth, width) are not publicly disclosed, limiting reproducibility and systemic analysis.
  • Exploration of smaller, open variants is highlighted as a means to democratize access to self-verifiable mathematical reasoning.

7. Broader Significance and Future Directions

DeepSeekMath-V2 demonstrates that LLMs, trained within a multi-level verification framework, can generate not only plausible mathematical proofs but can develop the capability to self-audit, resolve issues, and deliver solutions with verifiable rigor. This architecture offers a blueprint for future AI systems aimed at trustworthy, self-guaranteed derivations, and provides an empirical foundation for advances in AI-driven scientific discovery and mathematical research (Shao et al., 27 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to DeepSeekMath-V2.