Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 120 tok/s Pro
Kimi K2 211 tok/s Pro
GPT OSS 120B 416 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

An In-Context Learning Agent for Formal Theorem-Proving (2310.04353v5)

Published 6 Oct 2023 in cs.LG, cs.AI, cs.LO, and cs.PL

Abstract: We present an in-context learning agent for formal theorem-proving in environments like Lean and Coq. Current state-of-the-art models for the problem are finetuned on environment-specific proof data. By contrast, our approach, called COPRA, repeatedly asks a high-capacity, general-purpose LLM (GPT-4) to propose tactic applications from within a stateful backtracking search. Proposed tactics are executed in the underlying proof environment. Feedback from the execution is used to build the prompt for the next model query, along with selected information from the search history and lemmas retrieved from an external database. We evaluate our implementation of COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the CompCert project. On these benchmarks, COPRA significantly outperforms few-shot invocations of GPT-4. It also compares favorably against finetuning-based approaches, outperforming ReProver, a state-of-the-art finetuned approach for Lean, in terms of the pass@1 metric. Our code and data are available at https://github.com/trishullab/copra.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Llemma: An open language model for mathematics, 2023.
  2. The Lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25, pp.  378–388. Springer, 2015.
  3. Baldur: whole-proof generation and repair with large language models. arXiv preprint arXiv:2303.04910, 2023.
  4. Proof artifact co-training for theorem proving with language models. arXiv preprint arXiv:2102.06203, 2021.
  5. Gamepad: A learning environment for theorem proving. In ICLR, 2019.
  6. The coq proof assistant a tutorial. Rapport Technique, 178, 1997.
  7. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. arXiv preprint arXiv:2210.12283, 2022a.
  8. Thor: Wielding hammers to integrate language models and automated theorem provers. Advances in Neural Information Processing Systems, 35:8360–8373, 2022b.
  9. Reinforcement learning of theorem proving. Advances in Neural Information Processing Systems, 31, 2018.
  10. Hypertree proof search for neural theorem proving. Advances in Neural Information Processing Systems, 35:26337–26349, 2022.
  11. Xavier Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107–115, 2009.
  12. Empirical explorations of the logic theory machine: a case study in heuristic. In Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability, pp.  218–230, 1957.
  13. OpenAI. GPT-4 technical report, 2023a.
  14. OpenAI. GPT-4 and GPT-4 turbo, 2023b. URL https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo.
  15. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393, 2020.
  16. Formal mathematics statement curriculum learning. arXiv preprint arXiv:2202.01344, 2022.
  17. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  18. Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp.  1–10, 2020.
  19. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  20. Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366, 2023.
  21. Significant-Gravitas. Autogpt. https://github.com/Significant-Gravitas/Auto-GPT, 2023.
  22. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  23. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  24. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  25. Tacticzero: Learning to prove theorems from scratch with deep reinforcement learning. Advances in Neural Information Processing Systems, 34:9330–9342, 2021.
  26. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning, pp.  6984–6994. PMLR, 2019.
  27. Leandojo: Theorem proving with retrieval-augmented language models. arXiv preprint arXiv:2306.15626, 2023.
  28. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  29. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  30. Minif2f: a cross-system benchmark for formal olympiad-level mathematics. arXiv preprint arXiv:2109.00110, 2021.
Citations (12)

Summary

  • The paper introduces Copra, an agent leveraging GPT-4's in-context learning combined with intelligent search and step-by-step environment interaction for formal theorem-proving.
  • Copra achieves competitive performance on Lean and Coq benchmarks, demonstrating that in-context learning with search can rival or surpass fine-tuned systems in efficiency.
  • This approach highlights the potential of general-purpose LLMs like GPT-4 to generalize across formal systems and bridge neural-symbolic methods for complex reasoning tasks.

An In-Context Learning Agent for Formal Theorem-Proving

The paper "An In-Context Learning Agent for Formal Theorem-Proving" explores a novel approach to automatic theorem-proving utilizing in-context learning capabilities of contemporary LLMs, particularly GPT-4. The researchers introduce a system named Copra, which contrasts with existing paradigms that depend heavily on fine-tuned models trained explicitly on formal proofs from specific environments like Lean or Coq.

Overview and Contributions

The heart of Copra lies in leveraging in-context learning, which uses the LLM's ability to adapt to different contexts provided by prompts to predict the next proof step. Here, GPT-4 is deployed from within a stateful search algorithm, in which each proposed tactic is executed in the theorem-proving environment. The failure or success of tactics informs subsequent moves, illustrating a dynamic interplay between LLM predictions and environment feedback.

Copra's approach is characterized by several key features:

  1. Integration of Intelligent Search: By invoking the LLM iteratively as part of a backtracking search, Copra goes beyond mere few-shot inference. It discerns which tactics make genuine progress by checking the simplicity of resulting proof states as compared to previous states.
  2. Retrieval-Augmented Contexts: The system utilizes external databases for lemma and theorem retrieval, thus reinforcing the model's capacity to deduce formal proofs without explicit environment-centric fine-tuning.
  3. Interactive Theorem Proving: Rather than pre-committing to an entire proof sequence at once, Copra interacts with the theorem-proving environment step-by-step, utilizing execution feedback to avoid pitfalls like cyclic dependencies or inefficient tactic sequences. This interactivity introduces robustness against hallucinations often associated with LLM outputs.
  4. General-Purpose LLM Utilization: Remarkably, the approach demonstrates the power of a black-box LLM when combined with intelligent prompting and environment interaction, achieving competitive results across differing theorem-proving frameworks without bespoke model training for each.

Results and Implications

Empirically, Copra's efficacy is tested on benchmarks from both Lean's mathematics library (mathlib) and Coq's CompCert. Notably, Copra surpasses few-shot GPT-4 invocations, illustrating that the synergy of search and LLM-guided step-taking is integral to success. Against other state-of-the-art systems, the approach shows comparable performance, outmatching some finetuned systems in Lean tasks like the miniF2F benchmark in terms of queries needed per proof, indicating enhanced efficiency.

Copra underscores the potential of LLMs like GPT-4 to generalize across multiple domains within theorem proving, suggesting advancements in AI applications from mathematics to software verification. The method bridges the gap between purely neural and purely symbolic approaches, leveraging strengths from each to navigate complex proof spaces effectively.

Speculation on Future Directions

Looking forward, this research paves multiple avenues for exploration, among them:

  • Fine-Tuning and Scaling: Validating whether LLMs smaller or comparable to GPT-4 can approach these tasks with augmented in-context learning through symbolic data-driven synthesis strategies could propose cost-effective alternatives.
  • Cross-Domain Adaptability: Extending the current methodology to other formal systems that incorporate inherent decision procedures or tactics could reveal the universality of this technique.

Through Copra, the paper illustrates a significant stride in harnessing the contextual understanding inherent in LLMs beyond routine NLP tasks, pushing the boundaries of their applicability to domains demanding high precision and iterative learning, such as formal theorem-proving.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 8 posts and received 75 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube