Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Autoformalizing Euclidean Geometry (2405.17216v1)

Published 27 May 2024 in cs.LG, cs.AI, cs.LO, and stat.ML

Abstract: Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean geometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and LLMs. One challenge in Euclidean geometry is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. To address this issue, we use theorem provers to fill in such diagrammatic information automatically, so that the LLM only needs to autoformalize the explicit textual steps, making it easier for the model. We also provide automatic semantic evaluation for autoformalized theorem statements. We construct LeanEuclid, an autoformalization benchmark consisting of problems from Euclid's Elements and the UniGeo dataset formalized in the Lean proof assistant. Experiments with GPT-4 and GPT-4V show the capability and limitations of state-of-the-art LLMs on autoformalizing geometry problems. The data and code are available at https://github.com/loganrjmurphy/LeanEuclid.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Towards a mathematics formalisation assistant using large language models. arXiv preprint arXiv:2211.07524, 2022.
  2. Avigad, J. Notes on ”A formal system for Euclid’s Elements”. https://www.andrew.cmu.edu/user/avigad/Papers/euclid_notes.htm.
  3. A formal system for Euclid’s Elements. The Review of Symbolic Logic, 2009.
  4. ProofNet: Autoformalizing and formally proving undergraduate-level mathematics. arXiv preprint arXiv:2302.12433, 2023.
  5. The Coq proof assistant reference manual: Version 6.1. PhD thesis, Inria, 1997.
  6. Satisfiability modulo theories. 2018.
  7. Proof-checking Euclid. Annals of Mathematics and Artificial Intelligence, 2019.
  8. Buchberger, B. Applications of gröbner bases in non-linear computational geometry. In Trends in Computer Algebra, 2005.
  9. GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning. In Findings of the Association for Computational Linguistics: ACL, 2021a.
  10. UniGeo: Unifying geometry logical reasoning via reformulating mathematical expression. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
  11. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021b.
  12. NL2TL: Transforming natural languages to temporal logics using large language models. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
  13. nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. In International Conference on Computer Aided Verification (CAV), 2023.
  14. Towards autoformalization of mathematics and code correctness: Experiments with elementary proofs. arXiv preprint arXiv:2301.02195, 2023.
  15. The Lean 4 theorem prover and programming language. In International Conference on Automated Deduction (CADE), 2021.
  16. Baldur: Whole-proof generation and repair with large language models. In Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2023.
  17. G-LLaVA: Solving geometric problem with multi-modal large language model. arXiv preprint arXiv:2312.11370, 2023.
  18. A formal proof of the Kepler conjecture. In Forum of Mathematics, Pi, volume 5, 2017.
  19. Proof artifact co-training for theorem proving with language models. In International Conference on Learning Representations (ICLR), 2022.
  20. Heiberg, J. L. Euclid’s Elements. 2007. URL https://github.com/rfitzp/Elements.
  21. Hernandez-Espiet, A. feat: synthetic geometry. https://github.com/leanprover-community/mathlib4/pull/7300, 2023.
  22. Hilbert, D. Grundlagen der geometrie. Springer-Verlag, 2013.
  23. Draft, Sketch, and Prove: Guiding formal theorem provers with informal proofs. In International Conference on Learning Representations (ICLR), 2023a.
  24. Multilingual mathematical autoformalization. arXiv preprint arXiv:2311.03755, 2023b.
  25. GeomVerse: A systematic evaluation of large models for geometric reasoning. arXiv preprint arXiv:2312.12241, 2023.
  26. HyperTree proof search for neural theorem proving. In Neural Information Processing Systems (NeurIPS), 2022.
  27. CompCert—a formally verified optimizing compiler. In Embedded Real Time Software and Systems, 2016.
  28. A survey on deep learning for theorem proving. arXiv preprint arXiv:2404.09939, 2024.
  29. UniMath: A foundational and multimodal mathematical reasoner. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
  30. Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning. In Annual Meeting of the Association for Computational Linguistics (ACL), 2021.
  31. Miller, N. G. A diagrammatic formal system for Euclidean geometry. 2001.
  32. Mumma, J. Proofs, pictures, and Euclid. Synthese, 2010.
  33. Nevins, A. J. Plane geometry theorem proving using forward chaining. Artificial Intelligence, 1975.
  34. OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  35. Data-efficient learning of natural language to linear temporal logic translators for robot task specification. In International Conference on Robotics and Automation (ICRA), 2023.
  36. BLEU: a method for automatic evaluation of machine translation. In Annual Meeting of the Association for Computational Linguistics (ACL), 2002.
  37. GeoDRL: A self-learning framework for geometry problem solving using reinforcement learning in deductive reasoning. In Findings of the Association for Computational Linguistics: ACL, 2023.
  38. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393, 2020.
  39. Schulz, S. System description: E 1.8. In International Conference on Logic for Programming, Artificial Intelligence, and Reasoning, 2013.
  40. Metamathematische methoden in der geometrie. Springer-Verlag, 2013.
  41. Towards large language models as copilots for theorem proving in Lean. arXiv preprint arXiv: Arxiv-2404.12534, 2024.
  42. The mathlib Community. The Lean mathematical library. In Certified Programs and Proofs (CPP), 2020.
  43. Solving olympiad geometry without human demonstrations. Nature, 2023.
  44. First experiments with neural translation of informal to formal mathematics. In Conferences on Intelligent Computer Mathematics (CICM), 2018.
  45. Wu, W.-t. On the decision problem and the mechanization of theorem-proving in elementary geometry. In Selected Works Of Wen-Tsun Wu. 2008.
  46. Autoformalization with large language models. In Neural Information Processing Systems (NeurIPS), 2022.
  47. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning (ICML), 2019.
  48. LeanDojo: Theorem proving with retrieval-augmented language models. In Neural Information Processing Systems (NeurIPS), 2023.
  49. Automated production of traditional proofs for theorems in Euclidean geometry I. the Hilbert intersection point theorems. Annals of Mathematics and Artificial Intelligence, 1995.
Citations (3)

Summary

  • The paper introduces a neuro-symbolic approach that formalizes Euclidean geometry by bridging informal descriptions with machine-checkable proofs using LLMs and SMT solvers.
  • The paper details the development of the LeanEuclid benchmark from Euclid’s Elements and reports a 21% success rate in autoformalizing theorems with advanced models.
  • The paper demonstrates that combining diagrammatic reasoning with textual analysis enhances autoformalization accuracy, paving the way for improved AI-driven theorem proving.

Autoformalizing Euclidean Geometry: Bridging Informal and Formal Mathematics

The paper "Autoformalizing Euclidean Geometry" presents a comprehensive approach towards translating informal geometric descriptions into formal theorems and proofs suitable for machine verification. This research is significant as it addresses a critical challenge in applying machine learning to mathematics: the ability to accurately formalize informal mathematical concepts. The domain of Euclidean geometry is especially pertinent to this paper given its reliance on diagrams and implicit reasoning, providing a controlled environment for exploring these translation methodologies.

The authors introduce a neuro-symbolic framework that integrates domain knowledge, SMT solvers, and LLMs to tackle the task of autoformalization in Euclidean geometry. A primary challenge identified is the need to fill in logical gaps that arise from diagrams used in geometric proofs, often not explicitly accounted for in the textual representation. The proposed system alleviates this by utilizing theorem provers to extract diagrammatic reasoning, thereby simplifying the autoformalization process for the LLM by focusing primarily on textual steps.

The paper details the construction of a Lean-based benchmark, LeanEuclid, derived from Euclid’s Elements, and explores the performance of state-of-the-art models like GPT-4 and its multimodal variant GPT-4V. These experiments demonstrate the capabilities of LLMs in translating human-readable geometry problems into formal theorems and proofs. LeanEuclid serves as both a testbed for these models and a dataset promoting further research into this autoformalization task.

A significant contribution is the development of an SMT-based symbolic reasoning engine employed for two core functions: validating equivalence between autoformalized theorems and filling reasoning gaps in geometric proofs. This helps extend the model's capabilities beyond syntax matching, instead grounding it in the semantic equivalency of propositions. This is crucial given the potential variability in how a theorem may be formalized while maintaining correctness.

The numerical results provided in the paper offer meaningful insights. The ability of GPT-4V to outperform its text-only counterpart in certain tasks underscores the benefits of incorporating multimodal data. Nevertheless, the relatively low success rates, with 21% accuracy for theorem formalization even using advanced models, highlight the complexity and subtle challenges inherent to this task.

These findings carry profound implications for both practical and theoretical advancements in AI and mathematics. Practically, refining autoformalization techniques can elevate mathematical software, enhancing tools for automated theorem proving and tutoring systems. Theoretically, embracing domains like Euclidean geometry for autoformalization challenges models to grasp complex, multimodal reasoning, steering advancements in model architecture and training methodologies.

Future developments will likely explore more sophisticated symbolic reasoning techniques and more nuanced integration of multimodal data to improve formalization accuracy. Increasing benchmark sizes and diversifying problem types will be crucial to facilitating broader progress across different mathematical domains.

In conclusion, this paper illuminates both the intricacies of translating informal mathematical texts into formal logic and the promising pathways AI opens in this domain. As models continue to evolve, embracing the challenges and nuances of tasks like autoformalization is essential for advancing AI’s role in understanding and generating mathematical proofs.