- The paper presents a dual-role LLM system that functions as both conjecturer and prover to mitigate training data scarcity in theorem proving.
- It introduces a three-stage methodology—supervised finetuning, iterative self-play, and final retraining—that enhances proof generation capabilities.
- Experiments using Lean and Isabelle demonstrate significant improvements, with Lean success rates increasing from 13.2% to 26.3%.
Overview of Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
The paper "Beyond Limited Data: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving" addresses a significant challenge in the domain of formal theorem proving using LLMs—the scarcity of high-quality training data. Traditional methods, such as reinforcement learning (RL) or expert iteration, attempt to mitigate data scarcity by iterating between generating proofs and fine-tuning on correct ones; however, they often plateau due to the limited availability of correct proofs.
The authors present an innovative approach inspired by human mathematicians, who not only solve existing problems but also generate new conjectures. The introduction of a Self-play Theorem Prover (STP) represents a dual-role system where the LLM operates as both a conjecturer and a prover. This system creates an environment where conjectures and proofs serve as self-generated training datasets, facilitating continuous learning without additional external data.
Methodology
The STP model is constructed to perform in three stages:
- Model Initialization via Supervised Finetuning (SFT): The LLM is fine-tuned to play dual roles. The prover is trained on existing theorem-proof pairs to learn proof generation, while the conjecturer is exposed to a subset of known results to encourage the generation of novel conjectures.
- Self-play Training: This stage involves iterative and interactive learning between the conjecturer and prover. The conjecturer generates new, related conjectures based on a seed theorem and its proof, while the prover attempts to prove both existing statements and newly generated conjectures. Successful proofs provide feedback to improve both roles, with an emphasis on conjectures that are neither too easy nor impossible to prove (i.e., having a low but positive pass rate).
- Final Re-training: To ensure stability and effectiveness, the final model is retrained from the base model using proofs collected throughout the STP iterations, consolidating learning advancements.
Results and Implications
Experimentation with Lean and Isabelle formal verifiers yielded notable results. With Lean, STP proved 26.3% of the LeanWorkbook dataset statements, significantly outperforming the previous best result of 13.2%. On benchmarks like miniF2F-test and ProofNet-test, STP reached state-of-the-art performance among whole-proof generation methods, including in environments with different sampling budgets.
These findings suggest that STP can efficiently scale and enhance theorem proving capabilities by maintaining diverse and dynamically challenging datasets. This model opens up new potential for AI development in reasoning tasks beyond current dataset limitations, potentially impacting AI's role in formal mathematics and broader applications requiring complex reasoning.
Observations and Future Directions
The approach of integrating self-generated conjectures as a continuous source of challenging tasks marks a significant shift in how LLMs are trained for theorem proving. The adaptability of STP suggests a promising way to handle sparse data environments, fundamental for applications seeking to advance beyond current knowledge boundaries.
Future research could explore refining the conjecturer's ability to generate more balanced and topic-diverse conjectures, optimizing for long-term improvements in prover capabilities. Further investigation into STP's application to natural language theorem translation and broader AI-driven conjecturing might also yield advancements in AGI research. Expanding this methodology into other formal proof systems or integrating with step-based prover models could provide additional enhancement avenues.
In conclusion, the work highlights strategic innovation in AI for mathematical reasoning, underscoring the significance of self-generated tasks in data-limited contexts and paving the way for autonomous reasoning systems in unexplored realms of mathematics and beyond.