Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning (2504.11354v1)

Published 15 Apr 2025 in cs.AI

Abstract: We introduce Kimina-Prover Preview, a LLM that pioneers a novel reasoning-driven exploration paradigm for formal theorem proving, as showcased in this preview release. Trained with a large-scale reinforcement learning pipeline from Qwen2.5-72B, Kimina-Prover demonstrates strong performance in Lean 4 proof generation by employing a structured reasoning pattern we term \textit{formal reasoning pattern}. This approach allows the model to emulate human problem-solving strategies in Lean, iteratively generating and refining proof steps. Kimina-Prover sets a new state-of-the-art on the miniF2F benchmark, reaching 80.7% with pass@8192. Beyond improved benchmark performance, our work yields several key insights: (1) Kimina-Prover exhibits high sample efficiency, delivering strong results even with minimal sampling (pass@1) and scaling effectively with computational budget, stemming from its unique reasoning pattern and RL training; (2) we demonstrate clear performance scaling with model size, a trend previously unobserved for neural theorem provers in formal mathematics; (3) the learned reasoning style, distinct from traditional search algorithms, shows potential to bridge the gap between formal verification and informal mathematical intuition. We open source distilled versions with 1.5B and 7B parameters of Kimina-Prover

Summary

Analyzing the Kimina-Prover Preview: Formulating Large Formal Reasoning Models through Reinforcement Learning

The research presents Kimina-Prover Preview, an innovative LLM developed specifically for formal theorem proving within the Lean 4 proof assistant environment. Unlike existing approaches that often combine LLMs with classical search algorithms such as Monte Carlo Tree Search (MCTS) or Best-First Search (BFS), this model features a reasoning-driven exploration methodology grounded in reinforcement learning (RL). This distinction sets it apart from other neural theorem provers, as it leverages its internal structured reasoning capabilities, termed the formal reasoning pattern, to emulate a human-like problem-solving process.

Key Findings and Results

Kimina-Prover Preview showcases significant advancements across various facets of theorem proving. The model improves upon previous methods, exemplified by its performance on the miniF2F benchmark. It sets a state-of-the-art (SotA) result by achieving an 80.7% pass rate with a sample budget of 8192, surpassing the preceding best result achieved by BFS Prover, which was 72.95%. Notably, Kimina-Prover is distinguished by its enhanced sample efficiency, achieving strong performance with minimal sampling down to pass@1 and displaying promising scalability with respect to both model size and computational budget.

The model is identified to scale effectively with growth in model size—a trend not previously documented in formal mathematics neural theorem proving. This observation indicates that Kimina-Prover can exploit increased model capacity to extend its reasoning capabilities, a pivotal breakthrough since prior models struggled to showcase such performance scalability with enhanced model dimensions.

Additionally, the paper discusses how the model facilitates a bridge between formal verification and informal mathematical intuition. The successful alignment of informal and formal reasoning through the formal reasoning pattern underscores its potential to integrate informal problem-solving strategies within a formal theorem proving framework.

Methodological Insights

The autoformalization process employed in constructing the problem set enabled the aggregation and conversion of informal natural language problems into formal Lean 4 statements. This automated pipeline addresses the cost and time inefficiencies associated with manually curating a formal problem set. Furthermore, the structured expert iteration loop featuring LLM-based feedback enhances both dataset diversity and training dataset quality, showing effective utilization of mixed-type data for RL training.

The mechanism underlying the formal reasoning pattern allows for a uniquely decomposed and reflective proof style. This framework enables the model to intersperse informal reasoning with Lean code snippets, facilitating a seamless translation between structured human-like reasoning and machine formalization.

Implications and Future Directions

The introduction of Kimina-Prover Preview bears considerable implications for the domain of automated theorem proving, suggesting an evolutionary pathway where reinforcement learning supersedes conventional search algorithms. This transition could mitigate computational overhead while leading to the development of more independent, less auxiliary-bound reasoning models. However, the model's limited training on the formal data necessitates concerns over format collapse, suggesting further exploration into strategies for stabilizing RL training data and outputs—an area meriting future research endeavors.

Lastly, the emergent ability of Kimina-Prover to exhibit human-like proof structures presents intriguing future applications beyond purely formal settings. Prospective research might focus on further exploiting informal reasoning data and methodologies to improve model adaptability and performance across diverse mathematical contexts.

In conclusion, Kimina-Prover Preview exemplifies a strategic pivot in the field of theorem proving, leveraging reinforcement learning and internal reasoning patterns to achieve notable reasoning sophistication and computational efficacy. These accomplishments set a robust foundation for continued advancements in neural network-driven formal reasoning systems.

Youtube Logo Streamline Icon: https://streamlinehq.com