Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving (2412.20735v3)

Published 30 Dec 2024 in cs.AI and cs.CL

Abstract: We introduce HunyuanProver, an LLM finetuned from the Hunyuan 7B for interactive automatic theorem proving with LEAN4. To alleviate the data sparsity issue, we design a scalable framework to iterative synthesize data with low cost. Besides, guided tree search algorithms are designed to enable effective system 2 thinking of the prover. HunyuanProver achieves state-of-the-art (SOTA) performances on major benchmarks. Specifically, it achieves a pass of 68.4% on the miniF2F-test compared to 65.9%, the current SOTA results. It proves 4 IMO statements (imo_1960_p2, imo_1962_p2}, imo_1964_p2 and imo_1983_p6) in miniF2F-test. To benefit the community, we will open-source a dataset of 30k synthesized instances, where each instance contains the original question in natural language, the converted statement by autoformalization, and the proof by HunyuanProver.

Summary

  • The paper shows HunyuanProver enhances automated theorem proving by integrating scalable data synthesis with iterative data expansion to overcome data sparsity.
  • It introduces a guided tree search that employs best-first and Monte Carlo tree search algorithms, including a Distance Critic model for efficient proof navigation.
  • Empirical tests on miniF2F benchmark achieved a record 68.4% accuracy and solved challenging IMO-level problems, underscoring its scalable potential.

HunyuanProver: A Scalable Framework for Automated Theorem Proving

The paper introduces HunyuanProver, an advanced framework focused on improving automated theorem proving with the adoption of LLMs specifically finetuned for interactive theorem proving in LEAN4. Recognizing the challenges posed by data sparsity and the profound search space encountered in Olympiad-level problems, the authors propose a novel architecture facilitating both scalable data synthesis and enhanced search algorithms.

Overview of Contributions

HunyuanProver is founded on two pivotal components: a scalable data generation module and a sophisticated guided tree search method. The former leverages open-source data for training initial models and employs an iterative procedure to continuously create new proof data. This method significantly expands the dataset, addressing the traditional bottleneck of insufficient data. In testing, the prover's performance benefits from meticulously designed tree search algorithms that imitate 'system 2 thinking,' thus increasing the likelihood of solving complex problems.

Among its notable achievements, HunyuanProver set a new performance record on several benchmarks, achieving an accuracy of 68.4% on the miniF2F-test. This score surpasses prior state-of-the-art results and highlights the system's ability to effectively solve four distinct problems from the International Mathematical Olympiad (IMO) dataset within the same test.

Methodological Advances

Scalable Data Generation

The paper underlines the inadequacy of existing datasets for theorem proving, addressing this through a two-fold approach: autoformalization and tactic-level data generation. Initially, HunyuanProver converts a substantial volume of natural language math problems into a formal system, using an autoformalization model trained on multilingual data. This model enables the conversion of tens of millions of math statements into LEAN format, supplemented by strategies aimed at increasing the model's diversity.

Furthermore, the iterative data-generation framework refines the prover in a loop, where each iteration improves upon the last by utilizing LEAN's internal engine to generate new tactic data. This process not only scales up the dataset to billions of tokens but also systematically refines the prover's efficacy by integrating novel proof trajectories gathered during previous runs.

Guided Tree Search

The guided tree search component represents a significant enhancement over basic methods. The authors implement an advanced decision mechanism using best-first search (BFS) and Monte Carlo tree search (MCTS) algorithms equipped with multiple critic models for guidance. Notably, the Distance Critic model, which predicts the steps necessary to complete a proof, demonstrated marked improvements in results, attesting to the model's utility in navigating complex proof spaces.

Evaluation and Implications

Comprehensive evaluations were conducted on the miniF2F benchmark, revealing HunyuanProver's superior performance. The iterative method of data expansion played a crucial role in advancing accuracy, emphasizing the importance of large-scale data and effective curation strategies.

The researchers also explored the integral role of critics in tree search methodologies, highlighting the benefits of using policy confidence and process reward models in tandem with state exploration mechanisms. Analysis indicated that integrating these techniques improved accuracy and search efficiency significantly.

Future Prospects

The framework proposed in this paper sets a promising direction for further work in automated theorem proving. Potential avenues for exploration include enhancing the efficiency of tree search algorithms, refining data curation strategies, and integrating increasingly sophisticated critic models. These enhancements underscore ongoing developments toward achieving robust, scalable theorem-proving systems amenable to real-world application in various mathematical and computational domains.

In summary, HunyuanProver represents a substantial stride forward in large-scale automated theorem proving, underscoring the potential of combining scalable data generation with advanced search methodologies in leveraging the capabilities of modern LLMs.

Youtube Logo Streamline Icon: https://streamlinehq.com