EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages (2402.16878v1)
Abstract: Formal mathematics is the discipline of translating mathematics into a programming language in which any statement can be unequivocally checked by a computer. Mathematicians and computer scientists have spent decades of painstaking formalization efforts developing languages such as Coq, HOL, and Lean. Machine learning research has converged on these formal math corpora and given rise to an assortment of methodologies to aid in interactive and automated theorem proving. However, these papers have primarily focused on one method, for one proof task, in one language. This paper introduces EvoGPT-f: a novel evolutionary framework for the first systematic quantitative analysis of the differential machine learnability of five formal math corpora (Lean 3, Lean 4, Coq, HOL 4, HOL Light) using four tokenization methods (character, word-level, Byte Pair Encoding and StarCoder tokenizer). This paper does not put to rest the question of the "best" or "easiest" language to learn. Rather, this framework and preliminary findings begin to illuminate the differential machine learnability of these languages, offering a foundation to forge more systematic quantitative and qualitative comparative research across communities.
- T. mathlib Community, “The lean mathematical library,” in Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, ser. CPP 2020. New York, NY, USA: Association for Computing Machinery, 2020, pp. 367–381. [Online]. Available: https://doi.org/10.1145/3372885.3373824
- The Coq Development Team, “Coq.” [Online]. Available: https://coq.inria.fr
- J. Harrison, “Hol light: An overview,” in Theorem Proving in Higher Order Logics, S. Berghofer, T. Nipkow, C. Urban, and M. Wenzel, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 60–66.
- “The mizar mathematical library.” Browsable online at http://mizar.org/.
- F. Wiedijk, “Mizar’s soft type system,” in Theorem Proving in Higher Order Logics, K. Schneider and J. Brandt, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 383–399.
- “The QED Manifesto,” in Proceedings of the 12th International Conference on Automated Deduction, ser. Lecture Notes in Computer Science, vol. 814. Springer-Verlag, 1994, pp. 238–251.
- A. A. Alemi, F. Chollet, N. Een, G. Irving, C. Szegedy, and J. Urban, “Deepmath - deep sequence models for premise selection,” 2016. [Online]. Available: https://arxiv.org/abs/1606.04442
- D. Whalen, “Holophrasm: a neural automated theorem prover for higher-order logic,” 2016. [Online]. Available: https://arxiv.org/abs/1608.02644
- S. M. Loos, G. Irving, C. Szegedy, and C. Kaliszyk, “Deep network guided proof search,” CoRR, vol. abs/1701.06972, 2017. [Online]. Available: http://arxiv.org/abs/1701.06972
- M. Wang, Y. Tang, J. Wang, and J. Deng, “Premise selection for theorem proving by deep graph embedding,” CoRR, vol. abs/1709.09994, 2017. [Online]. Available: http://arxiv.org/abs/1709.09994
- C. Kaliszyk, J. Urban, H. Michalewski, and M. Olsák, “Reinforcement learning of theorem proving,” CoRR, vol. abs/1805.07563, 2018. [Online]. Available: http://arxiv.org/abs/1805.07563
- D. Huang, P. Dhariwal, D. Song, and I. Sutskever, “Gamepad: A learning environment for theorem proving,” CoRR, vol. abs/1806.00608, 2018. [Online]. Available: http://arxiv.org/abs/1806.00608
- K. Bansal, S. M. Loos, M. N. Rabe, C. Szegedy, and S. Wilcox, “Holist: An environment for machine learning of higher-order theorem proving (extended version),” CoRR, vol. abs/1904.03241, 2019. [Online]. Available: http://arxiv.org/abs/1904.03241
- M. Wang and J. Deng, “Learning to prove theorems by learning to generate theorems,” 2020. [Online]. Available: https://arxiv.org/abs/2002.07019
- S. Polu and I. Sutskever, “Generative language modeling for automated theorem proving,” 2020. [Online]. Available: https://arxiv.org/abs/2009.03393
- Y. Wu, M. N. Rabe, W. Li, J. Ba, R. B. Grosse, and C. Szegedy, “LIME: learning inductive bias for primitives of mathematical reasoning,” CoRR, vol. abs/2101.06223, 2021. [Online]. Available: https://arxiv.org/abs/2101.06223
- J. M. Han, J. Rute, Y. Wu, E. W. Ayers, and S. Polu, “Proof artifact co-training for theorem proving with language models,” CoRR, vol. abs/2102.06203, 2021. [Online]. Available: https://arxiv.org/abs/2102.06203
- G. Lample, M.-A. Lachaux, T. Lavril, X. Martinet, A. Hayat, G. Ebner, A. Rodriguez, and T. Lacroix, “Hypertree proof search for neural theorem proving,” 2022. [Online]. Available: https://arxiv.org/abs/2205.11491
- K. Yang, A. M. Swope, A. Gu, R. Chalamala, P. Song, S. Yu, S. Godil, R. Prenger, and A. Anandkumar, “Leandojo: Theorem proving with retrieval-augmented language models,” 2023.
- S. Sch"̈afer and S. Schulz, “Breeding theorem proving heuristics with genetic algorithms,” in Global Conference on Artificial Intelligence, 2015.
- S. Schulz, “E - a brainiac theorem prover,” AI Commun., vol. 15, pp. 111–126, 2002.
- F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagné, “DEAP: Evolutionary algorithms made easy,” Journal of Machine Learning Research, vol. 13, pp. 2171–2175, jul 2012.
- L. Yang, J. Liu, C. Chen, and Y. Chen, “Automatically proving mathematical theorems with evolutionary algorithms and proof assistants,” CoRR, vol. abs/1602.07455, 2016. [Online]. Available: http://arxiv.org/abs/1602.07455
- S.-Y. Huang and Y.-p. Chen, “Proving theorems by using evolutionary search with human involvement,” in 2017 IEEE Congress on Evolutionary Computation (CEC). IEEE Press, 2017, p. 1495–1502. [Online]. Available: https://doi.org/10.1109/CEC.2017.7969480
- Y. Nagashima, “Towards evolutionary theorem proving for isabelle/hol,” CoRR, vol. abs/1904.08468, 2019. [Online]. Available: http://arxiv.org/abs/1904.08468
- M. S. Nawaz, M. Z. Nawaz, O. Hasan, P. Fournier-Viger, and M. Sun, “Proof searching and prediction in HOL4 with evolutionary/heuristic and deep learning techniques,” Applied Intelligence, vol. 51, no. 3, pp. 1580–1601, Mar. 2021. [Online]. Available: https://doi.org/10.1007/s10489-020-01837-7
- ——, “An evolutionary/heuristic-based proof searching framework for interactive theorem prover,” Applied Soft Computing, vol. 104, p. 107200, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S156849462100123X
- V. Zammit, “A comparative study of coq and hol,” in Theorem Proving in Higher Order Logics, E. L. Gunter and A. Felty, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 323–337.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017. [Online]. Available: https://arxiv.org/abs/1706.03762
- P. Gage, “A new algorithm for data compression,” The C Users Journal archive, vol. 12, pp. 23–38, 1994. [Online]. Available: https://api.semanticscholar.org/CorpusID:59804030
- R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” CoRR, vol. abs/1508.07909, 2015. [Online]. Available: http://arxiv.org/abs/1508.07909
- R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim, Q. Liu, E. Zheltonozhskii, T. Y. Zhuo, T. Wang, O. Dehaene, M. Davaadorj, J. Lamy-Poirier, J. Monteiro, O. Shliazhko, N. Gontier, N. Meade, A. Zebaze, M.-H. Yee, L. K. Umapathi, J. Zhu, B. Lipkin, M. Oblokulov, Z. Wang, R. Murthy, J. Stillerman, S. S. Patel, D. Abulkhanov, M. Zocca, M. Dey, Z. Zhang, N. Fahmy, U. Bhattacharyya, W. Yu, S. Singh, S. Luccioni, P. Villegas, M. Kunakov, F. Zhdanov, M. Romero, T. Lee, N. Timor, J. Ding, C. Schlesinger, H. Schoelkopf, J. Ebert, T. Dao, M. Mishra, A. Gu, J. Robinson, C. J. Anderson, B. Dolan-Gavitt, D. Contractor, S. Reddy, D. Fried, D. Bahdanau, Y. Jernite, C. M. Ferrandis, S. Hughes, T. Wolf, A. Guha, L. von Werra, and H. de Vries, “Starcoder: may the source be with you!” 2023.
- T. Coquand and G. Huet, “The calculus of constructions,” Information and Computation, vol. 76, no. 2, pp. 95–120, 1988. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0890540188900053
- A. Church, “A formulation of a simple theory of types,” Journal of Symbolic Logic, vol. 5, pp. 56–68, 1940, http://www.jstor.org/stable/2266866Electronic Edition. [Online]. Available: http://www.jstor.org/stable/2266866
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” CoRR, vol. abs/1711.05101, 2017. [Online]. Available: http://arxiv.org/abs/1711.05101