Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages (2402.16878v1)

Published 12 Feb 2024 in cs.AI, cs.CL, cs.LG, and cs.NE

Abstract: Formal mathematics is the discipline of translating mathematics into a programming language in which any statement can be unequivocally checked by a computer. Mathematicians and computer scientists have spent decades of painstaking formalization efforts developing languages such as Coq, HOL, and Lean. Machine learning research has converged on these formal math corpora and given rise to an assortment of methodologies to aid in interactive and automated theorem proving. However, these papers have primarily focused on one method, for one proof task, in one language. This paper introduces EvoGPT-f: a novel evolutionary framework for the first systematic quantitative analysis of the differential machine learnability of five formal math corpora (Lean 3, Lean 4, Coq, HOL 4, HOL Light) using four tokenization methods (character, word-level, Byte Pair Encoding and StarCoder tokenizer). This paper does not put to rest the question of the "best" or "easiest" language to learn. Rather, this framework and preliminary findings begin to illuminate the differential machine learnability of these languages, offering a foundation to forge more systematic quantitative and qualitative comparative research across communities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. T. mathlib Community, “The lean mathematical library,” in Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, ser. CPP 2020.   New York, NY, USA: Association for Computing Machinery, 2020, pp. 367–381. [Online]. Available: https://doi.org/10.1145/3372885.3373824
  2. The Coq Development Team, “Coq.” [Online]. Available: https://coq.inria.fr
  3. J. Harrison, “Hol light: An overview,” in Theorem Proving in Higher Order Logics, S. Berghofer, T. Nipkow, C. Urban, and M. Wenzel, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 60–66.
  4. “The mizar mathematical library.” Browsable online at http://mizar.org/.
  5. F. Wiedijk, “Mizar’s soft type system,” in Theorem Proving in Higher Order Logics, K. Schneider and J. Brandt, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 383–399.
  6. “The QED Manifesto,” in Proceedings of the 12th International Conference on Automated Deduction, ser. Lecture Notes in Computer Science, vol. 814.   Springer-Verlag, 1994, pp. 238–251.
  7. A. A. Alemi, F. Chollet, N. Een, G. Irving, C. Szegedy, and J. Urban, “Deepmath - deep sequence models for premise selection,” 2016. [Online]. Available: https://arxiv.org/abs/1606.04442
  8. D. Whalen, “Holophrasm: a neural automated theorem prover for higher-order logic,” 2016. [Online]. Available: https://arxiv.org/abs/1608.02644
  9. S. M. Loos, G. Irving, C. Szegedy, and C. Kaliszyk, “Deep network guided proof search,” CoRR, vol. abs/1701.06972, 2017. [Online]. Available: http://arxiv.org/abs/1701.06972
  10. M. Wang, Y. Tang, J. Wang, and J. Deng, “Premise selection for theorem proving by deep graph embedding,” CoRR, vol. abs/1709.09994, 2017. [Online]. Available: http://arxiv.org/abs/1709.09994
  11. C. Kaliszyk, J. Urban, H. Michalewski, and M. Olsák, “Reinforcement learning of theorem proving,” CoRR, vol. abs/1805.07563, 2018. [Online]. Available: http://arxiv.org/abs/1805.07563
  12. D. Huang, P. Dhariwal, D. Song, and I. Sutskever, “Gamepad: A learning environment for theorem proving,” CoRR, vol. abs/1806.00608, 2018. [Online]. Available: http://arxiv.org/abs/1806.00608
  13. K. Bansal, S. M. Loos, M. N. Rabe, C. Szegedy, and S. Wilcox, “Holist: An environment for machine learning of higher-order theorem proving (extended version),” CoRR, vol. abs/1904.03241, 2019. [Online]. Available: http://arxiv.org/abs/1904.03241
  14. M. Wang and J. Deng, “Learning to prove theorems by learning to generate theorems,” 2020. [Online]. Available: https://arxiv.org/abs/2002.07019
  15. S. Polu and I. Sutskever, “Generative language modeling for automated theorem proving,” 2020. [Online]. Available: https://arxiv.org/abs/2009.03393
  16. Y. Wu, M. N. Rabe, W. Li, J. Ba, R. B. Grosse, and C. Szegedy, “LIME: learning inductive bias for primitives of mathematical reasoning,” CoRR, vol. abs/2101.06223, 2021. [Online]. Available: https://arxiv.org/abs/2101.06223
  17. J. M. Han, J. Rute, Y. Wu, E. W. Ayers, and S. Polu, “Proof artifact co-training for theorem proving with language models,” CoRR, vol. abs/2102.06203, 2021. [Online]. Available: https://arxiv.org/abs/2102.06203
  18. G. Lample, M.-A. Lachaux, T. Lavril, X. Martinet, A. Hayat, G. Ebner, A. Rodriguez, and T. Lacroix, “Hypertree proof search for neural theorem proving,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11491
  19. K. Yang, A. M. Swope, A. Gu, R. Chalamala, P. Song, S. Yu, S. Godil, R. Prenger, and A. Anandkumar, “Leandojo: Theorem proving with retrieval-augmented language models,” 2023.
  20. S. Sch"̈afer and S. Schulz, “Breeding theorem proving heuristics with genetic algorithms,” in Global Conference on Artificial Intelligence, 2015.
  21. S. Schulz, “E - a brainiac theorem prover,” AI Commun., vol. 15, pp. 111–126, 2002.
  22. F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagné, “DEAP: Evolutionary algorithms made easy,” Journal of Machine Learning Research, vol. 13, pp. 2171–2175, jul 2012.
  23. L. Yang, J. Liu, C. Chen, and Y. Chen, “Automatically proving mathematical theorems with evolutionary algorithms and proof assistants,” CoRR, vol. abs/1602.07455, 2016. [Online]. Available: http://arxiv.org/abs/1602.07455
  24. S.-Y. Huang and Y.-p. Chen, “Proving theorems by using evolutionary search with human involvement,” in 2017 IEEE Congress on Evolutionary Computation (CEC).   IEEE Press, 2017, p. 1495–1502. [Online]. Available: https://doi.org/10.1109/CEC.2017.7969480
  25. Y. Nagashima, “Towards evolutionary theorem proving for isabelle/hol,” CoRR, vol. abs/1904.08468, 2019. [Online]. Available: http://arxiv.org/abs/1904.08468
  26. M. S. Nawaz, M. Z. Nawaz, O. Hasan, P. Fournier-Viger, and M. Sun, “Proof searching and prediction in HOL4 with evolutionary/heuristic and deep learning techniques,” Applied Intelligence, vol. 51, no. 3, pp. 1580–1601, Mar. 2021. [Online]. Available: https://doi.org/10.1007/s10489-020-01837-7
  27. ——, “An evolutionary/heuristic-based proof searching framework for interactive theorem prover,” Applied Soft Computing, vol. 104, p. 107200, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S156849462100123X
  28. V. Zammit, “A comparative study of coq and hol,” in Theorem Proving in Higher Order Logics, E. L. Gunter and A. Felty, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 323–337.
  29. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017. [Online]. Available: https://arxiv.org/abs/1706.03762
  30. P. Gage, “A new algorithm for data compression,” The C Users Journal archive, vol. 12, pp. 23–38, 1994. [Online]. Available: https://api.semanticscholar.org/CorpusID:59804030
  31. R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” CoRR, vol. abs/1508.07909, 2015. [Online]. Available: http://arxiv.org/abs/1508.07909
  32. R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim, Q. Liu, E. Zheltonozhskii, T. Y. Zhuo, T. Wang, O. Dehaene, M. Davaadorj, J. Lamy-Poirier, J. Monteiro, O. Shliazhko, N. Gontier, N. Meade, A. Zebaze, M.-H. Yee, L. K. Umapathi, J. Zhu, B. Lipkin, M. Oblokulov, Z. Wang, R. Murthy, J. Stillerman, S. S. Patel, D. Abulkhanov, M. Zocca, M. Dey, Z. Zhang, N. Fahmy, U. Bhattacharyya, W. Yu, S. Singh, S. Luccioni, P. Villegas, M. Kunakov, F. Zhdanov, M. Romero, T. Lee, N. Timor, J. Ding, C. Schlesinger, H. Schoelkopf, J. Ebert, T. Dao, M. Mishra, A. Gu, J. Robinson, C. J. Anderson, B. Dolan-Gavitt, D. Contractor, S. Reddy, D. Fried, D. Bahdanau, Y. Jernite, C. M. Ferrandis, S. Hughes, T. Wolf, A. Guha, L. von Werra, and H. de Vries, “Starcoder: may the source be with you!” 2023.
  33. T. Coquand and G. Huet, “The calculus of constructions,” Information and Computation, vol. 76, no. 2, pp. 95–120, 1988. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0890540188900053
  34. A. Church, “A formulation of a simple theory of types,” Journal of Symbolic Logic, vol. 5, pp. 56–68, 1940, http://www.jstor.org/stable/2266866Electronic Edition. [Online]. Available: http://www.jstor.org/stable/2266866
  35. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32.   Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
  36. I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” CoRR, vol. abs/1711.05101, 2017. [Online]. Available: http://arxiv.org/abs/1711.05101

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com