MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation (2505.10962v1)

Published 16 May 2025 in cs.AI

Abstract: Automated Theorem Proving (ATP) in formal languages remains a formidable challenge in AI, demanding rigorous logical deduction and navigating vast search spaces. While LLMs have shown promising performance, existing stepwise provers often suffer from biased search guidance, leading to inefficiencies and suboptimal proof strategies. This paper introduces the Multi-Perspective Search Prover (MPS-Prover), a novel stepwise ATP system designed to overcome these limitations. MPS-Prover incorporates two key innovations: a highly effective post-training data curation strategy that prunes approximately 40% of redundant training data without sacrificing performance, and a multi-perspective tree search mechanism. This search integrates a learned critic model with strategically designed heuristic rules to diversify tactic selection, prevent getting trapped in unproductive states, and enhance search robustness. Extensive evaluations demonstrate that MPS-Prover achieves state-of-the-art performance on multiple challenging benchmarks, including miniF2F and ProofNet, outperforming prior 7B parameter models. Furthermore, our analyses reveal that MPS-Prover generates significantly shorter and more diverse proofs compared to existing stepwise and whole-proof methods, highlighting its efficiency and efficacy. Our work advances the capabilities of LLM-based formal reasoning and offers a robust framework and a comprehensive analysis for developing more powerful theorem provers.

Authors (7)

Zhenwen Liang (22 papers)
Linfeng Song (76 papers)
Yang Li (1142 papers)
Tao Yang (520 papers)
Feng Zhang (180 papers)
Haitao Mi (56 papers)
Dong Yu (329 papers)

Summary

A Technical Overview of MPS-Prover: Advancing Stepwise Theorem Proving

Introduction

The paper introduces MPS-Prover, a new approach in automated theorem proving (ATP) aimed at significantly enhancing stepwise ATP systems. In the ATP domain, where rigorous logical deduction and large search spaces present substantial challenges, this system focuses on overcoming inefficiencies caused by biased search guidance and suboptimal proof strategies. MPS-Prover's core contributions lie in its innovative post-training data curation strategy and its multi-perspective tree search mechanism. These contributions work synergistically to prune redundant training data and diversify tactic selection, thus enhancing search robustness and proof efficiency.

Key Innovations

Post-Training Data Curation: The authors implement a meticulous strategy to curate training data post-training, pruning approximately 40% of redundant information without degrading model performance. By removing overly simplistic proofs and ineffective yet frequently used tactics, the dataset becomes more focused, thereby reducing overfitting and enhancing the model's ability to handle complex reasoning patterns.

Multi-Perspective Tree Search: This novel search method integrates a learned critic model with heuristic rules to prevent search stagnation and to navigate more effectively through the proof space. The tree search is diversified by evaluating each expansion step with both the critic's score and heuristic perspectives, thus reducing the risk of falling into local optima characterized by repetitive or unproductive states—common pitfalls for stepwise provers.

Results and Comparison

MPS-Prover demonstrates strong performance on benchmarks such as miniF2F and ProofNet, surpassing previous state-of-the-art stepwise methods like BFS-Prover in multiple scenarios. The system's capacity to generate shorter and highly diverse proofs further highlights its efficacy. With an accumulative accuracy rate reaching 75.82% on miniF2F, MPS-Prover outperforms many small whole-proof models and nearly matches the distilled versions of larger systems. Furthermore, on the more demanding ProofNet benchmark, MPS-Prover achieves a success rate of 32.97%, showing superior handling of complex proofs compared to other 7B parameter models. The system generates proofs with a mean length of 3.44 steps, indicating a substantial improvement in proof conciseness and efficiency over both Kimina and DeepSeek-Prover V2, whose proofs are notably longer.

Implications and Future Work

The contributions of MPS-Prover provide a robust framework and a comprehensive analysis that push forward the capabilities of LLM-based formal reasoning. Practically, it offers a more efficient and fault-tolerant system capable of aiding mathematicians in verifying solutions and proofs with higher accuracy. Theoretically, it contributes insights into hybrid prover development, proposing pathways that combine stepwise and whole-proof strategies. Such hybrid methods could potentially optimize lemma handling and introduce new ways to balance the depth and breadth of exploration in theorem proving.

The work also specifies future directions, such as incorporating reinforcement learning to refine both heuristic critiques and critic models from direct proof assistant feedback. The paper advocates for exploiting larger base models, possibly integrating techniques such as Chain-of-Thought reasoning during tactic generation, to achieve even higher accuracy and efficiency. Exploration into hybrid stepwise and whole-proof systems appears particularly promising, especially in addressing complex intermediate lemma handling.

In conclusion, the MPS-Prover presents a substantial advancement in the field of ATP by harnessing diverse perspectives to enhance proof efficiencies, which may inspire further research into multi-perspective and hybrid systems in AI-based theorem proving.

Related Papers

Find Related Papers

Tweets

https://twitter.com/LiangZhenwen/status/1924266216442749143

https://twitter.com/GptMaestro/status/1934288816724189684

https://twitter.com/paws_ed/status/1924300792309408003