Mathesis: Towards Formal Theorem Proving from Natural Languages (2506.07047v1)

Published 8 Jun 2025 in cs.AI

Abstract: Recent advances in LLMs show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem statements. It contributes Mathesis-Autoformalizer, the first autoformalizer using reinforcement learning to enhance the formalization ability of natural language problems, aided by our novel LeanScorer framework for nuanced formalization quality assessment. It also proposes a Mathesis-Prover, which generates formal proofs from the formalized statements. To evaluate the real-world applicability of end-to-end formal theorem proving, we introduce Gaokao-Formal, a benchmark of 488 complex problems from China's national college entrance exam. Our approach is carefully designed, with a thorough study of each component. Experiments demonstrate Mathesis's effectiveness, with the autoformalizer outperforming the best baseline by 22% in pass-rate on Gaokao-Formal. The full system surpasses other model combinations, achieving 64% accuracy on MiniF2F with pass@32 and a state-of-the-art 18% on Gaokao-Formal.

Summary

The paper presents Mathesis, a complete pipeline that uses reinforcement learning and GRPO to translate informal natural language problems into formal proofs.
It introduces LeanScorer, a tool employing Sugeno Fuzzy Integral for nuanced semantic evaluations that align closely with human judgments.
The study leverages the Gaokao-Formal benchmark and Mathesis-Prover, achieving a 22% improvement in pass rates and setting new standards in theorem proving.

Insights on "Mathesis: Towards Formal Theorem Proving from Natural Languages"

In the field of formal theorem proving, the paper "Mathesis: Towards Formal Theorem Proving from Natural Languages" presents a comprehensive approach to bridging the gap between informal natural language problem statements and the formal languages needed for automated theorem proving. The research touches upon various aspects including the development of Mathesis, an end-to-end theorem proving pipeline, and the introduction of new methodologies and benchmarks tailored to address real-world applications.

Contributions and Methodologies

At the heart of this paper is Mathesis, which seeks to automate the process of converting informal problem statements into formal mathematical proofs. This pipeline incorporates several novel components:

Mathesis-Autoformalizer: This model is central to the process, relying on reinforcement learning to automatically translate natural language problems into formal ones. Utilizing a Group Relative Policy Optimization (GRPO) approach, the autoformalizer dynamically learns to improve its translations by integrating both syntactic validity checks and semantic correctness evaluations. The iterative training via Hierarchical Preference Optimization showcases a unique decision-making layer that aligns formalizations with the proof success capabilities of theorem proving applications.
LeanScorer: LeanScorer is introduced as a sophisticated semantic evaluation tool, capable of providing nuanced assessments of formalization quality beyond basic correctness checks. Employing Sugeno Fuzzy Integral for aggregation, it effectively balances flexibility with precision, offering fine-grained categorizations of consistency with human assessments.
Gaokao-Formal Benchmark: A novel benchmark comprising 488 complex mathematical problems sourced from China's National Higher Education Entrance Examination, Gaokao-Formal enriches the field of theorem proving with diverse challenges, capturing intricacies of multi-domain mathematical topics. This benchmark aims to test and refine the capabilities of formal reasoning models in dealing with real-world problem complexities.
Mathesis-Prover: The advanced prover component aims to generate machine-verifiable proofs from the formalized statements, leveraging an expert iteration training approach for continuous improvement, adapting successfully to increases in problem diversity and difficulty.

Experimental Results

The experiments conducted demonstrate significant strides in improving both autoformalization and end-to-end theorem proving efficacy. Mathesis-Autoformalizer achieves notable improvements in pass rates by 22% on the Gaokao-Formal benchmark compared to existing baselines, indicating the capability of the model to handle more nuanced and complex transformations from informal to formal statements. Furthermore, the Mathesis-Prover sets a new standard in MiniF2F evaluations, highlighting the pipeline’s success in translating informal problem statements to fully verified formal proofs, achieving top-tier accuracy rates.

Implications and Future Directions

The implications of this research are profound, paving the way for automated systems capable of processing real-world problems expressed in natural language for direct formal reasoning tasks. Practically, the removal of dependency on manually formalized input statements positions Mathesis as a versatile tool adaptable to various mathematical domains. Theoretically, it underscores the role of reinforcement learning and preference optimization as valuable strategies in handling dynamic and complex language transformations.

Future development can expand upon unified model architectures, encompassing all stages from language input to proof generation, fostering broader integration of machine learning advances in formal reasoning applications. This paper, through its pivotal advancements, provides a robust foundation for future explorations into automated mathematics and beyond.

PDF Markdown

Tweets

https://twitter.com/paws_ed/status/1933546585922941294