A Survey on Deep Learning for Theorem Proving (2404.09939v3)
Abstract: Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in natural language to rigorous derivations in formal systems. In recent years, the advancement of deep learning, especially the emergence of LLMs, has sparked a notable surge of research exploring these techniques to enhance the process of theorem proving. This paper presents a comprehensive survey of deep learning for theorem proving by offering (i) a thorough review of existing approaches across various tasks such as autoformalization, premise selection, proofstep generation, and proof search; (ii) an extensive summary of curated datasets and strategies for synthetic data generation; (iii) a detailed analysis of evaluation metrics and the performance of state-of-the-art methods; and (iv) a critical discussion on the persistent challenges and the promising avenues for future exploration. Our survey aims to serve as a foundational reference for deep learning approaches in theorem proving, inspiring and catalyzing further research endeavors in this rapidly growing field. A curated list of papers is available at https://github.com/zhaoyu-li/DL4TP.
- An experimental study of formula embeddings for automated theorem proving in first-order logic. arXiv preprint arXiv:2002.00423, 2020.
- Learning to guide a saturation-based theorem prover. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Towards a mathematics formalisation assistant using large language models. arXiv preprint arXiv:2211.07524, 2022.
- Premise selection for mathematics by corpus analysis and kernel methods. Journal of Automated Reasoning, 2014.
- Learning to prove from synthetic theorems. arXiv preprint arXiv:2006.11259, 2020.
- Proving theorems using incremental learning and hindsight experience replay. In Proceedings of the 39th International Conference on Machine Learning, 2022.
- ProofNet: Autoformalizing and formally proving undergraduate-level mathematics. arXiv preprint arXiv:2302.12433, 2023.
- Llemma: An open language model for mathematics. In The Twelfth International Conference on Learning Representations, 2024.
- Learning alignment between formal & informal mathematics. In 5th Conference on Artificial Intelligence and Theorem Proving, 2020.
- HOList: An environment for machine learning of higher order logic theorem proving. In Proceedings of the 36th International Conference on Machine Learning, 2019.
- The Coq proof assistant reference manual. INRIA, 1999.
- How much should this symbol weigh? a gnn-advised clause selection. In Proceedings of 24th International Conference on Logic for Programming, Artificial Intelligence and Reasoning, 2023.
- MLFMF: Data sets for machine learning for mathematical formalization. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
- Hammering towards QED. Journal of Formalized Reasoning, 2016.
- Sledgehammer: Judgement day. In Proceedings of the 5th International Joint Conference on Automated Reasoning, 2010.
- Verified multi-step synthesis using large language models and monte carlo tree search. arXiv preprint arXiv:2402.08147, 2024.
- Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
- N.G. Bruijn, de. The mathematical language AUTOMATH, its usage and some of its extensions. In Proceedings Symposium on Automatic Demonstration, 1970.
- Kevin Buzzard. Lean in 2024. https://xenaproject.wordpress.com/2024/01/20/lean-in-2024/, 2024.
- Davide Castelvecchi. Mathematicians welcome computer-assisted proof in ‘grand unification’ theory. Nature, 2021.
- UniGeo: Unifying geometry logical reasoning via reformulating mathematical expression. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, 2020.
- Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014.
- A deductive database approach to automated geometry theorem proving and discovering. Journal of Automated Reasoning, 2000.
- PaLM: Scaling language modeling with pathways. Journal of Machine Learning Research, 2023.
- ENIGMA-NG: Efficient neural and gradient-boosted inference guidance for E. In 27th International Conference on Automated Deduction, 2019.
- Learning theorem proving components. In 30th International Conference on Automated Reasoning with Analytic Tableaux and Related Methods, 2021.
- Guiding an instantiation prover with graph neural networks. In Proceedings of 24th International Conference on Logic for Programming, Artificial Intelligence and Reasoning, 2023.
- Evaluating language models for mathematics through interactions. arXiv preprint arXiv:2306.01694, 2023.
- Improving graph neural network representations of logical formulae with subgraph pooling. arXiv preprint arXiv:1911.06904, 2019.
- A deep reinforcement learning approach to first-order logic theorem proving. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021.
- Towards autoformalization of mathematics and code correctness: Experiments with elementary proofs. In Proceedings of the 1st Workshop on Mathematical Natural Language Processing, 2022.
- Hammer for Coq: Automation for dependent type theory. Journal of automated reasoning, 2018.
- Keyword-based natural language premise selection for an automatic mathematical statement proving. In Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing, 2022.
- Martin Davis. A computer program for Presburger’s algorithm. Symbolic Computation Automation of Reasoning 1, 1957.
- A computing procedure for quantification theory. Journal of the ACM, 1960.
- Learning from previous proof experience: A survey. Technical Report, 1999.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
- Convolutional networks on graphs for learning molecular fingerprints. In Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015.
- Learning dynamic polynomial proofs. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019.
- Natural language premise selection: Finding supporting statements for mathematical text. In Proceedings of the Twelfth Language Resources and Evaluation Conference, 2020a.
- Premise selection in natural language mathematical texts. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020b.
- STAR: Cross-modal [STA]tement [R]epresentation for selecting relevant mathematical premises. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistic, 2021.
- Training a first-order theorem prover from synthetic data. In 9th International Conference on Learning Representations Workshop on Mathematical Reasoning in General Artificial Intelligence, 2021.
- Diversity-driven automated formal verification. In Proceedings of the 44th International Conference on Software Engineering, 2022.
- TacTok: Semantics-aware proof synthesis. Proceedings of the ACM on Programming Languages, 2020.
- Baldur: Whole-proof generation and repair with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023.
- An ensemble approach for automated theorem proving based on efficient name invariant graph neural representations. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023.
- Large language models for mathematicians. arXiv preprint arXiv:2312.04556, 2023a.
- Mathematical capabilities of ChatGPT. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023b.
- LLM vs ITP. In 37th Conference on Neural Information Processing Systems Workshop on MATH-AI, 2023c.
- Towards automating formalisation of theorem statements using large language models. In 36th Conference on Neural Information Processing Systems Workshop on MATH-AI, 2022.
- Thibault Gauthier. Deep reinforcement learning for synthesizing functions in higher-order logic. In Proceedings of 23rd International Conference on Logic for Programming, Artificial Intelligence and Reasoning, 2020.
- Thibault Gauthier. Learned provability likelihood for tactical search. In Proceedings of the 9th International Symposium on Symbolic Computation in Software Science, 2021.
- TacticToe: Learning to prove with tactics. Journal of Automated Reasoning, 2020.
- Temperature-scaled large language models for Lean proofstep prediction. In 37th Conference on Neural Information Processing Systems Workshop on MATH-AI, 2023.
- Usefulness of lemmas via graph neural networks. In 4th Conference on Artificial Intelligence and Theorem Proving, 2019.
- Fast and slow ENIGMAs and parental guidance. In 13th International Symposium on Frontiers of Combining Systems, 2021.
- The Isabelle ENIGMA. In 13th International Conference on Interactive Theorem Proving, 2022.
- Georges Gonthier. The four colour theorem: Engineering of a formal proof. In 8th Asian Symposium on Computer Mathematics, 2008.
- A formal proof of the Kepler conjecture. In Forum of Mathematics, Pi, 2017.
- Contrastive finetuning of generative language models for informal premise selection. In 6th Conference on Artificial Intelligence and Theorem Proving, 2021.
- Proof artifact co-training for theorem proving with language models. In The Tenth International Conference on Learning Representations, 2022.
- John Harrison. HOL Light: A tutorial introduction. In International Conference on Formal Methods in Computer-Aided Design, 1996.
- FGeo-TP: A language model-enhanced solver for geometry problems. arXiv preprint arXiv:2402.09047, 2024.
- Graph sequence learning for premise selection. In 8th Conference on Artificial Intelligence and Theorem Proving, 2023.
- GamePad: A learning environment for theorem proving. In The Seventh International Conference on Learning Representations, 2019.
- MUSTARD: Mastering uniform synthesis of theorem and proof data. In The Twelfth International Conference on Learning Representations, 2024.
- DeepMath - deep sequence models for premise selection. In Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016.
- ENIGMA: efficient learning-based inference guiding machine. In 10th International Conference on Intelligent Computer Mathematics, 2017.
- Hammering Mizar by learning clause guidance. In 10th International Conference on Interactive Theorem Proving, 2019.
- ENIGMA anonymous: Symbol-independent inference guiding machine (system description). In Proceedings of the 10th International Joint Conference on Automated Reasoning, 2020.
- MizAR 60 for Mizar 50. In 14th International Conference on Interactive Theorem Proving, 2023.
- Multilingual mathematical autoformalization. arXiv preprint arXiv:2311.03755, 2023a.
- Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In The Eleventh International Conference on Learning Representations, 2023b.
- LISA: Language models of Isabelle proofs. In 6th Conference on Artificial Intelligence and Theorem Proving, 2021.
- Thor: Wielding hammers to integrate language models and automated theorem provers. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022.
- Exploring mathematical conjecturing with large language models. In 17th International Workshop on Neural-Symbolic Learning and Reasoning, 2023.
- FEMaLeCoP: Fairly efficient machine learning connection prover. In Proceedings of 20th International Conference on Logic for Programming, Artificial Intelligence and Reasoning, 2015a.
- MizAR 40 for Mizar 40. Journal of Automated Reasoning, 2015b.
- Developing corpus-based translation methods between informal and formal mathematics: Project description. In International Conference on Intelligent Computer Mathematics, 2014.
- Learning to parse on aligned corpora (rough diamond). In 6th International Conference on Interactive Theorem Proving, 2015.
- HolStep: A machine learning dataset for higher-order logic theorem proving. In The Fifth International Conference on Learning Representations, 2017.
- Reinforcement learning of theorem proving. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018.
- Finding inductive loop invariants using large language models. arXiv preprint arXiv:2311.07948, 2023.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020.
- GeomVerse: A systematic evaluation of large models for geometric reasoning. arXiv preprint arXiv:2312.12241, 2023.
- Formal verification in hardware design: A survey. ACM Transactions on Design Automation of Electronic Systems, 1999.
- seL4: Formal verification of an os kernel. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 2009.
- Konstantin Korovin. iProver–an instantiation-based theorem prover for first-order logic (system description). In Proceedings of the 4th International Joint Conference on Automated Reasoning, 2008.
- First-order theorem proving and Vampire. In International Conference on Computer Aided Verification, 2013.
- TextGraphs-16 natural language premise selection task: Zero-shot premise selection with prompting generative language models. In Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing, 2022.
- Premise selection with neural networks and distributed representation of features. arXiv preprint arXiv:1807.10268, 2018.
- Overview and evaluation of premise selection techniques for large theory mathematics. In Proceedings of the 6th International Joint Conference on Automated Reasoning, 2012.
- Automated theorem proving in intuitionistic propositional logic by deep reinforcement learning. arXiv preprint arXiv:1811.00796, 2018.
- Cross-lingual language model pretraining. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019.
- Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018.
- Hypertree proof search for neural theorem proving. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022.
- Mathematical reasoning in latent space. In The Eighth International Conference on Learning Representations, 2020.
- CompCert–a formally verified optimizing compiler. In Proceeding of the 8th European Congress on Embedded Real Time Software and Systems, 2016.
- Solving quantitative reasoning problems with language models. Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022.
- BERT is not the count: Learning to match mathematical statements with proofs. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023.
- IsarStep: A benchmark for high-level mathematical reasoning. In The Ninth International Conference on Learning Representations, 2021a.
- Graph contrastive pre-training for effective theorem reasoning. 38th International Conference on Machine Learning Workshop on Self-Supervised Learning for Reasoning and Perception, 2021b.
- UniMath: A foundational and multimodal mathematical reasoner. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023a.
- Let GPT be a math tutor: Teaching math word problem solvers with customized exercise generation. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023b.
- Contrastive graph representations for logical formulas embedding. IEEE Transactions on Knowledge and Data Engineering, 2021.
- FIMO: A challenge formal dataset for automated theorem proving. arXiv preprint arXiv:2309.04295, 2023.
- Attention recurrent cross-graph neural network for selecting premises. International Journal of Machine Learning and Cybernetics, 2022a.
- RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Learning to prove trigonometric identities. arXiv preprint arXiv:2207.06679, 2022b.
- Deep network guided proof search. In Proceedings of 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, 2017.
- Cones of matrices and set-functions and 0–1 optimization. SIAM journal on optimization, 1991.
- Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt, 2017.
- The mathlib Community. The Lean mathematical library. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, 2020.
- Reinforcement learning for guiding the E theorem prover. In Proceedings of the Thirty-Sixth International Florida Artificial Intelligence Research Society Conference, 2023.
- Magnushammer: A transformer-based approach to premise selection. In The Twelfth International Conference on Learning Representations, 2024.
- Robin Milner. Implementation and applications of Scott’s logic for computable functions. In Proceedings of ACM Conference on Proving Assertions about Programs, 1972.
- Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Automated theorem proving via interacting with proof assistants by dynamic strategies. In 6th International Conference on Big Data Computing and Communications, 2020.
- The Lean 4 theorem prover and programming language. In 28th International Conference on Automated Deduction, 2021.
- Property invariant embedding for automated reasoning. In 24th European Conference on Artificial Intelligence, 2019.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- leanCoP: Lean connection-based theorem proving. Journal of Symbolic Computation, 2003.
- Graph representations for higher-order logic and theorem proving. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
- BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002.
- OpenWebMath: An open dataset of high-quality mathematical web text. In The Twelfth International Conference on Learning Representations, 2024.
- A new approach towards autoformalization. arXiv preprint arXiv:2310.07957, 2023.
- Lawrence C Paulson. Isabelle: A Generic Theorem Prover. Springer, 1994.
- Tree-structure cnn for automated theorem proving. In 24th International Conference on Neural Information Processing, 2017.
- Learning equational theorem proving. In 6th Conference on Artificial Intelligence and Theorem Proving, 2021.
- Guiding an automated theorem prover with neural rewriting. In Proceedings of the 11th International Joint Conference on Automated Reasoning, 2022a.
- Machine learning meets the herbrand universe. arXiv preprint arXiv:2210.03590, 2022b.
- Stateful premise selection by recurrent neural networks. In Proceedings of 23rd International Conference on Logic for Programming, Artificial Intelligence and Reasoning, 2020a.
- Guiding inferences in connection tableau by recurrent neural networks. In 13th International Conference on Intelligent Computer Mathematics, 2020b.
- Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393, 2020.
- Formal mathematics statement curriculum learning. In The Eleventh International Conference on Learning Representations, 2023.
- Solving proof block problems using large language models. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education, 2024.
- Improving stateful premise selection with transformers. In 14th International Conference on Intelligent Computer Mathematics, 2021.
- Mathematical reasoning via self-supervised skip-tree training. In The Ninth International Conference on Learning Representations, 2021.
- A neurally-guided, parallel theorem prover. In 12th International Symposium on Frontiers of Combining Systems, 2019.
- Directed graph networks for logical reasoning (extended abstract). In Joint Proceedings of the 7th Workshop on Practical Aspects of Automated Reasoning (PAAR) and the 5th Satisfiability Checking and Symbolic Computation Workshop (SC-Square) Workshop, 2020.
- lazyCoP: Lazy paramodulation meets neurally guided search. In 30th International Conference on Automated Reasoning with Analytic Tableaux and Related Methods, 2021.
- Proof repair infrastructure for supervised models: Building a large proof repair dataset. In 14th International Conference on Interactive Theorem Proving, 2023.
- The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, 2009.
- Piotr Rudnicki. An overview of the Mizar project. In Proceedings of the 1992 Workshop on Types for Proofs and Programs, 1992.
- Graph2Tac: Learning hierarchical representations of math concepts in theorem proving. arXiv preprint arXiv:2401.02949, 2024.
- Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2020.
- Passport: Improving automated formal verification using identifiers. ACM Transactions on Programming Languages and Systems, 2023.
- Gregor vom Scheidt. Experimental results from applying GPT-4 to an unpublished formal language. arXiv preprint arXiv:2305.12196, 2023.
- Stephan Schulz. E–a brainiac theorem prover. AI Communications, 2002.
- Johann M Schumann. Automated Theorem Proving in Software Engineering. Springer Science & Business Media, 2001.
- DeepSeekMath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024.
- Boris Shminke. gym-saturation: Gymnasium environments for saturation provers (system description). In 32nd International Conference on Automated Reasoning with Analytic Tableaux and Related Methods, 2023.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018.
- MPNet: Masked and permuted pre-training for language understanding. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
- Towards large language models as copilots for theorem proving in Lean. In 37th Conference on Neural Information Processing Systems Workshop on MATH-AI, 2023.
- Augmenting the human mathematician. In 9th International Conference on Learning Representations Workshop on Mathematical Reasoning in General Artificial Intelligence, 2021.
- Martin Suda. Vampire with a brain is a good ITP hammer. In 13th International Symposium on Frontiers of Combining Systems, 2021a.
- Martin Suda. Improving ENIGMA-style clause selection while learning from history. In 28th International Conference on Automated Deduction, 2021b.
- Clover: Closed-loop verifiable code generation. arXiv preprint arXiv:2310.17807, 2023.
- Geoff Sutcliffe. The TPTP problem library and associated infrastructure. Journal of Automated Reasoning, 2017.
- Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014.
- Automatic acquisition of search guiding heuristics. In 10th International Conference on Automated Deduction, 1990.
- Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, 1999.
- Christian Szegedy. A promising path towards autoformalization and general artificial intelligence. In 13th International Conference on Intelligent Computer Mathematics, 2020.
- Retrieval-augmented proof step synthesis. In 6th Conference on Artificial Intelligence and Theorem Proving, 2021.
- Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015.
- An in-context learning agent for formal theorem-proving. arXiv preprint arXiv:2310.04353, 2023.
- IJS at TextGraphs-16 natural language premise selection task: Will contextual information improve natural language premise selection? In Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing, 2022.
- Solving olympiad geometry without human demonstrations. Nature, 2024.
- UNLPS at TextGraphs 2022 shared task: Unsupervised natural language premise selection in mathematical texts using sentence-MPNet. In Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing, 2022.
- Formal premise selection with language models. In 7th Conference on Artificial Intelligence and Theorem Proving, 2022.
- First neural conjecturing datasets and experiments. In 13th International Conference on Intelligent Computer Mathematics, 2020.
- MaLeCoP machine learning connection prover. In 20th International Conference on Automated Reasoning with Analytic Tableaux and Related Methods, 2011.
- WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
- Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
- Graph attention networks. In The Sixth International Conference on Learning Representations, 2018.
- Enhancing neural theorem proving through data augmentation and dynamic sampling method. arXiv preprint arXiv:2312.14188, 2023.
- DT-Solver: Automated theorem proving with dynamic-tree sampling guided by proof-level value function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023a.
- Learning to prove theorems by learning to generate theorems. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
- Premise selection for theorem proving by deep graph embedding. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
- First experiments with neural translation of informal to formal mathematics. In 11th International Conference on Intelligent Computer Mathematics, 2018.
- Exploration of neural machine translation in autoformalization of mathematics in Mizar. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, 2020.
- Generative ai for math: Part i–MathPile: A billion-token-scale pretraining corpus for math. arXiv preprint arXiv:2312.17120, 2023b.
- LLMSTEP: LLM proofstep suggestions in Lean. In 37th Conference on Neural Information Processing Systems Workshop on MATH-AI, 2023.
- NaturalProofs: Mathematical theorem proving in natural language. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
- NaturalProver: Grounded mathematical proof generation with language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022.
- Daniel Whalen. Holophrasm: A neural automated theorem prover for higher-order logic. arXiv preprint arXiv:1608.02644, 2016.
- Latent action space for efficient planning in theorem proving. In 6th Conference on Artificial Intelligence and Theorem Proving, 2021.
- TacticZero: Learning to prove theorems from scratch with deep reinforcement learning. Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021a.
- INT: An inequality benchmark for evaluating generalization in theorem proving. In The Ninth International Conference on Learning Representations, 2021b.
- LIME: Learning inductive bias for primitives of mathematical reasoning. In Proceedings of the 38th International Conference on Machine Learning, 2021c.
- Autoformalization with large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022.
- LEGO-prover: Neural theorem proving with growing libraries. In The Twelfth International Conference on Learning Representations, 2024.
- TRIGO: Benchmarking formal mathematical proof reduction for generative language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
- How powerful are graph neural networks? In The Seventh International Conference on Learning Representations, 2019.
- ByT5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics, 2022.
- Learning to prove theorems via interacting with proof assistants. In Proceedings of the 36th International Conference on Machine Learning, 2019.
- LeanDojo: Theorem proving with retrieval-augmented language models. In Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023.
- SATLM: Satisfiability-aided language models using declarative prompting. In Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023.
- CoProver: A recommender system for proof construction. In 16th International Conference on Intelligent Computer Mathematics, 2023.
- InternLM-Math: Open math large language models toward verifiable reasoning. arXiv preprint arXiv:2402.06332, 2024.
- Large language models’ understanding of math: Source criticism and extrapolation. arXiv preprint arXiv:2311.07618, 2023.
- Learning proof transformations and its applications in interactive theorem proving. In International Symposium on Frontiers of Combining Systems, 2023a.
- Selene: Pioneering automated proof in software verification. arXiv preprint arXiv:2401.07663, 2024.
- Getting more out of large language models for proofs. In 8th Conference on Artificial Intelligence and Theorem Proving, 2023b.
- FormalGeo: The first step toward human-like imo-level geometric automated reasoning. arXiv preprint arXiv:2310.18021, 2023c.
- Decomposing the enigma: Subgoal-based demonstration learning for formal theorem proving. arXiv preprint arXiv:2305.16366, 2023.
- Lyra: Orchestrating dual correction in automated theorem proving. arXiv preprint arXiv:2309.15806, 2023.
- MiniF2F: A cross-system benchmark for formal olympiad-level mathematics. In The Tenth International Conference on Learning Representations, 2022.
- Don’t trust: Verify–grounding llm quantitative reasoning with autoformalization. In The Twelfth International Conference on Learning Representations, 2024a.
- REFACTOR: Learning to extract theorems from proofs. In The Twelfth International Conference on Learning Representations, 2024b.
- Towards finding longer proofs. In 30th International Conference on Automated Reasoning with Analytic Tableaux and Related Methods, 2021a.
- The role of entropy in guiding a connection prover. In 30th International Conference on Automated Reasoning with Analytic Tableaux and Related Methods, 2021b.
- Zhaoyu Li (23 papers)
- Jialiang Sun (11 papers)
- Logan Murphy (3 papers)
- Qidong Su (7 papers)
- Zenan Li (22 papers)
- Xian Zhang (48 papers)
- Kaiyu Yang (24 papers)
- Xujie Si (36 papers)