Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Autoformalization with Large Language Models (2205.12615v1)

Published 25 May 2022 in cs.LG, cs.AI, cs.LO, and cs.SE

Abstract: Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we show LLMs provide new prospects towards this goal. We make the surprising observation that LLMs can correctly translate a significant portion ($25.3\%$) of mathematical competition problems perfectly to formal specifications in Isabelle/HOL. We demonstrate the usefulness of this process by improving a previously introduced neural theorem prover via training on these autoformalized theorems. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from $29.6\%$ to $35.2\%$.

Citations (135)

Summary

  • The paper presents a 25.3% autoformalization success rate when translating math competition problems into Isabelle/HOL using large language models.
  • It shows that autoformalized theorems improve a neural theorem prover’s performance from 29.6% to 35.2% on the MiniF2F benchmark.
  • The study leverages LLM-generated formal statements as additional training data, enhancing formal verification and program synthesis methodologies.

Autoformalization with LLMs

The paper "Autoformalization with LLMs" addresses the challenging task of translating natural language mathematics into formal specifications and proofs using LLMs. The authors highlight the potential impact of successful autoformalization in advancing fields such as formal verification, program synthesis, and artificial intelligence.

Key Contributions

  1. Translation Success: The paper presents evidence that LLMs attain a noteworthy success rate of 25.3% in autoformalizing mathematical competition problems into Isabelle/HOL, a formal theorem prover.
  2. Utility in Theorem Proving: The authors show that autoformalized theorems enhance the performance of a neural theorem prover, improving proof rates on the MiniF2F benchmark from 29.6% to a state-of-the-art 35.2%.
  3. Data Generation Methodology: The approach involves using LLMs to generate formal statements from natural language, which then serve as additional training data for theorem provers. This method leverages the capabilities of LLMs trained on vast corpuses to function in few-shot learning scenarios.

Methodology

The research focuses on case studies where LLMs formalize mathematical competition problems. For example, Codex, a GPT-based model, successfully translates problems into syntactically correct Isabelle code. The paper details a rigorous manual evaluation process to categorize failure cases and utilizes BLEU scores to quantitatively measure performance across different models and scales, with Codex outperforming other LLMs.

The authors employ expert iteration as a self-improvement paradigm, using successful proofs from autoformalizations to iteratively refine the theorem prover, Thor.

Implications and Future Work

The research opens up promising directions for integrating LLMs into formal systems, suggesting that continuous training and retrieval-augmented models could handle larger theories. The potential for cycle-consistency-based training and back-translation to improve translations is noted.

Challenges include aligning informal and formal definitions, handling logical gaps in proofs, and adapting to contexts that vary vastly in natural language.

Conclusion

This paper demonstrates the feasibility and utility of LLMs for autoformalization tasks, depicting them as valuable tools not only for translating natural language to formal language but also for enhancing automated theorem-proving systems. The authors propose promising avenues for future work to address the current limitations, aiming to create a feedback loop that could eventually rival human-level mathematical reasoning capabilities. This work lays a strong foundation for the pursuit of more extensive applications of LLMs in the domain of formal mathematics.