Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A New Approach Towards Autoformalization (2310.07957v3)

Published 12 Oct 2023 in cs.CL and cs.AI

Abstract: Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of background and context. In this paper, we propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks: unlinked formalization (formalization with unlinked definitions and theorems), entity linking (linking to the proper theorems and definitions), and finally adjusting types so it passes the type checker. In addition, we present arXiv2Formal, a benchmark dataset for unlinked formalization consisting of 50 theorems formalized for the Lean theorem prover sampled from papers on arXiv.org. We welcome any contributions from the community to future versions of this dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Towards a mathematics formalisation assistant using large language models. (arXiv:2211.07524), Nov 2022. URL http://arxiv.org/abs/2211.07524. arXiv:2211.07524 [cs].
  2. Proofnet: A benchmark for autoformalizing and formally proving undergraduate-level mathematics problems. NeurIPS 2022.
  3. Language models are few-shot learners, 2020.
  4. Davide Castelvecchi et al. Mathematicians welcome computer-assisted proof in ‘grand unification’theory. Nature, 595(7865):18–19, 2021.
  5. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374.
  6. Mathlib Community. Completion of the liquid tensor experiment, Jul 2022. URL https://leanprover-community.github.io/blog/posts/lte-final/.
  7. The lean theorem prover (system description). In CADE, 2015. URL https://api.semanticscholar.org/CorpusID:232990.
  8. Chelsea Edmonds. Hypergraphs. Archive of Formal Proofs, September 2023. ISSN 2150-914x. https://isa-afp.org/entries/Hypergraph_Basics.html, Formal proof development.
  9. Towards automating formalisation of theorem statements using large language models.
  10. Proof artifact co-training for theorem proving with language models. (arXiv:2102.06203), Mar 2022. URL http://arxiv.org/abs/2102.06203. arXiv:2102.06203 [cs].
  11. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv: 2103.03874, 2021.
  12. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. (arXiv:2210.12283), Feb 2023. URL http://arxiv.org/abs/2210.12283. arXiv:2210.12283 [cs].
  13. Generalized brjuno functions associated to α𝛼\alphaitalic_α-continued fractions. arXiv preprint arXiv: Arxiv-0705.1690, 2007.
  14. Dana Mackenzie. The poincaré conjecture–proved. Science, 314(5807):1848–1849, 2006. doi: 10.1126/science.314.5807.1848. URL https://www.science.org/doi/abs/10.1126/science.314.5807.1848.
  15. The mathlib Community. The lean mathematical library. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2020, page 367–381, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450370974. doi: 10.1145/3372885.3373824. URL https://doi.org/10.1145/3372885.3373824.
  16. OpenAI. Chatgpt, 2020. URL https://www.openai.com/research/chatgpt.
  17. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://aclanthology.org/P02-1040.
  18. Peter Scholze. Liquid tensor experiment, Dec 2020. URL https://xenaproject.wordpress.com/2020/12/05/liquid-tensor-experiment/.
  19. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 223–231, Cambridge, Massachusetts, USA, August 8-12 2006. Association for Machine Translation in the Americas. URL https://aclanthology.org/2006.amta-papers.25.
  20. Christian Szegedy, editor. A Promising Path Towards Autoformalization and General Artificial Intelligence, 2020.
  21. Autoformalization with large language models. (arXiv:2205.12615), May 2022. URL http://arxiv.org/abs/2205.12615. arXiv:2205.12615 [cs].
  22. Leandojo: Theorem proving with retrieval-augmented language models. arXiv preprint arXiv: 2306.15626, 2023.
  23. Minif2f: a cross-system benchmark for formal olympiad-level mathematics. arXiv preprint arXiv:2109.00110, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets