DeepSeekMath-V2: Self-Verified Math Proofs

Updated 1 December 2025

DeepSeekMath-V2 is a large language model that achieves self-verifiable mathematical reasoning by interleaving proof generation with learned verification.
It leverages a deep transformer with a 128,000-token context and block-sparse attention to effectively process long, multi-stage proofs.
The system sets a new standard on advanced benchmarks, demonstrating exceptional performance in open-ended mathematical problem solving.

DeepSeekMath-V2 is a LLM system targeting self-verifiable mathematical reasoning, with a focus on rigorous step-by-step theorem proving and proof verification. Building upon the limitations of final-answer-centric reward mechanisms, DeepSeekMath-V2 interleaves the training and deployment of both a proof generator and a learned verifier, culminating in a self-improving closed-loop pipeline. The system demonstrates exceptional performance on advanced mathematical benchmarks, establishing a new standard for machine reasoning in the context of open-ended mathematical problem solving (Shao et al., 27 Nov 2025).

1. Model Architecture

Both the proof generator and theorem verifier in DeepSeekMath-V2 share an identical backbone architecture based on the “DeepSeek-V3.2-Exp-Base” deep transformer model. Key features include a 128,000-token context window employing block-sparse attention, enabling the processing of long mathematical proofs and multi-stage reasoning.

Essential architectural details such as the number of layers, model width, and attention head count remain proprietary, but both generator and verifier operate at a scale consistent with current large autoregressive LLMs, each fine-tuned for its specialized task. The system introduces the following notation:

$X$ : problem statement
$Y$ : candidate proof
$V$ : verifier's analysis (identification of issues and a summary score)
$\mathcal{I}_v$ : high-level rubric specifying verification requirements
$\mathcal{I}_{mv}$ : meta-verification rubric for verifying the verifier's analysis

The generator and verifier are thus tightly coupled, with their outputs and analyses linking procedural and meta-analytical layers throughout the training and evaluation processes.

2. Training Objectives and Loss Formulations

2.1 Theorem Verifier

Proof verification is formulated as a reinforcement learning (RL) task with a composite reward function:

Format Reward:

$R_{\mathrm{format}}(V) = \begin{cases} 1 & \text{if } V \text{ contains the required issue summary and “}\boxed{\cdot}\text{” score format} \ 0 & \text{otherwise} \end{cases}$

Score Reward:

$R_{\mathrm{score}}(s',s) = 1 - |s' - s|,\quad s',s \in \{0,0.5,1\}$

The verifier's RL objective is: $\max_{\pi}\; \mathbb{E}_{(X,Y,s)\sim\mathcal{D}_v,\,(V,s')\sim\pi(\cdot|X,Y)} [R_{\mathrm{format}}(V)\cdot R_{\mathrm{score}}(s',s)]$ Meta-verification introduces an additional reward $R_{\mathrm{meta}}$ evaluating the faithfulness of the listed issues. The total reward becomes: $R_V = R_{\mathrm{format}}(V)\times R_{\mathrm{score}}(s',s)\times R_{\mathrm{meta}}$ A separate meta-verifier is trained with the same structure on meta-verification tasks.

2.2 Proof Generator

The proof generator is optimized by RL, directly using the learned verifier as its reward model. The generator samples a proof $Y$ and, uniquely, is incentivized to conduct self-evaluation by producing its own analysis $Z$ with self-assigned score $s'$ : $R = R_{\mathrm{format}}(Y,Z) \times [\alpha R_Y + \beta R_Z],\quad \alpha=0.76,\;\beta=0.24$ where

$R_Y = s, \quad R_Z = R_{\mathrm{score}}(s',s) \times R_{\mathrm{meta}}(Z)$

This blended reward explicitly balances the generator's solution quality and its self-verification fidelity as judged by the verifier and meta-verifier.

3. Self-Verification and Automated Labeling Pipeline

A closed-loop, self-verification protocol orchestrates inference and ongoing improvement. During candidate proof evaluation, the following algorithmic sequence is enacted:

Generate $n$ independent verifications $\{V^{(i)}\}$ for candidate proof $Y$ .
For verifications with score $<1$ , invoke $m$ meta-verifications $M^{(i,j)}$ .
Validate $V^{(i)}$ if the majority of $\{M^{(i,j)}\}_j$ confirm its conclusions.
Assign the lowest valid score $s_{\min}$ across all valid $V^{(i)}$ as the label for $Y$ , if at least $k$ such verifications agree.
If no issues are found in any verification, assign label 1; otherwise, defer labeling to human annotators.

This pipeline supports two critical functions:

Autonomous generation of high-quality training data for the verifier, especially on newly encountered, challenging proofs.
High-compute search at test time, incorporating repeated candidate generation, multi-threaded verification, and iterative refinement of proofs until consensus is achieved.

4. Empirical Performance and Benchmark Results

DeepSeekMath-V2 has been evaluated on both in-house and public competition datasets. The salient results are:

Competition	Problems Solved	Score
IMO 2025 (6)	5 fully, 0 partial	83.3%
CMO 2024 (6)	4 fully, 1 partial	73.8%
Putnam 2024 (12)	11 fully, 1 partial	98.3%

On a 91-problem CNML-level internal benchmark, DeepSeekMath-V2 outperforms other leading LLMs (GPT-5-Thinking-High, Gemini 2.5-Pro) in algebra, geometry, number theory, combinatorics, and inequalities, with correctness adjudicated by majority over 8 verifier analyses.

Ablation studies varying the number of self-verification loops (from 1 to 8) reveal monotonic improvements in Pass@1 and Best@32, indicating the tangible benefit of iterative refinement for proof quality.

Expert grading confirms model performance exceeds gold-medal thresholds and that the Putnam 2024 score (98.3%, 118/120) surpasses all known human efforts for that exam. Formal significance testing is not reported.

5. Key Insights and Theoretical Implications

Empirical findings indicate that self-verification substantially narrows the gap between generation and verification: compelling the generator to audit and rectify its own proofs before external evaluation leads to demonstrably stronger final solutions than simply optimizing for end answers. The inclusion of a meta-verifier—tasked with verifying the verifier—proves crucial for adopting faithful, non-hallucinatory issue discovery. Automated labeling via scaled multi-stage verification ensures scalability, drastically reducing reliance on manual annotation for challenging tasks.

6. Limitations and Open Challenges

Despite these advances, DeepSeekMath-V2 exhibits limitations:

Top-level IMO-hard and advanced-ProofBench problems remain challenging, suggesting the verifier and generator insufficiently capture intricate logical nuances.
No formal integration with automated proof assistants such as Lean or Isabelle is provided, meaning the pipeline remains semi-informal; bridging this with formal proof systems is recognized as an important next direction.
Model architecture specifications (size, depth, width) are not publicly disclosed, limiting reproducibility and systemic analysis.
Exploration of smaller, open variants is highlighted as a means to democratize access to self-verifiable mathematical reasoning.

7. Broader Significance and Future Directions

DeepSeekMath-V2 demonstrates that LLMs, trained within a multi-level verification framework, can generate not only plausible mathematical proofs but can develop the capability to self-audit, resolve issues, and deliver solutions with verifiable rigor. This architecture offers a blueprint for future AI systems aimed at trustworthy, self-guaranteed derivations, and provides an empirical foundation for advances in AI-driven scientific discovery and mathematical research (Shao et al., 27 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to DeepSeekMath-V2.