BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving (2403.03401v1)
Abstract: Artificial Intelligence for Theorem Proving has given rise to a plethora of benchmarks and methodologies, particularly in Interactive Theorem Proving (ITP). Research in the area is fragmented, with a diverse set of approaches being spread across several ITP systems. This presents a significant challenge to the comparison of methods, which are often complex and difficult to replicate. Addressing this, we present BAIT, a framework for fair and streamlined comparison of learning approaches in ITP. We demonstrate BAIT's capabilities with an in-depth comparison, across several ITP benchmarks, of state-of-the-art architectures applied to the problem of formula embedding. We find that Structure Aware Transformers perform particularly well, improving on techniques associated with the original problem sets. BAIT also allows us to assess the end-to-end proving performance of systems built on interactive environments. This unified perspective reveals a novel end-to-end system that improves on prior work. We also provide a qualitative analysis, illustrating that improved performance is associated with more semantically-aware embeddings. By streamlining the implementation and comparison of Machine Learning algorithms in the ITP context, we anticipate BAIT will be a springboard for future research.
- Holist: An environment for machine learning of higher order logic theorem proving. In International Conference on Machine Learning, 454–463. PMLR.
- Learning to Reason in Large Theories without Imitation. ArXiv:1905.10501.
- Biewald, L. 2020. Experiment Tracking with Weights and Biases. Software available from wandb.com.
- Structure-Aware Transformer for Graph Representation Learning. In International Conference on Machine Learning.
- Improving Graph Neural Network Representations of Logical Formulae with Subgraph Pooling. ArXiv:1911.06904.
- Can Neural Networks Understand Logical Entailment? In International Conference on Learning Representations.
- Sharing HOL4 and HOL Light Proof Knowledge. In Logic Programming and Automated Reasoning.
- TacticToe: Learning to Prove with Tactics. Journal of Automated Reasoning, 65(2): 257–286.
- Gonthier, G. 2008. The Four Colour Theorem: Engineering of a Formal Proof. In Kapur, D., ed., Computer Mathematics, Lecture Notes in Computer Science, 333–333. Berlin, Heidelberg: Springer. ISBN 978-3-540-87827-8.
- Proof Artifact Co-Training for Theorem Proving with Language Models. In International Conference on Learning Representations.
- GamePad: A Learning Environment for Theorem Proving. In International Conference on Learning Representations.
- LISA: Language models of ISAbelle proofs. In 6th Conference on Artificial Intelligence and Theorem Proving.
- HolStep: A Machine Learning Dataset for Higher-order Logic Theorem Proving. In International Conference on Learning Representations.
- MizAR 40 for Mizar 40. Journal of Automated Reasoning, 55(3): 245–256.
- seL4: formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09, 207–220. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-60558-752-3.
- HyperTree Proof Search for Neural Theorem Proving. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
- MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11): 13188–13190. Number: 11.
- Leroy, X. 2014. The CompCert C verified compiler: Documentation and user’s manual. Avalailable at https://compcert.org/man/manual.pdf.
- IsarStep: a Benchmark for High-level Mathematical Reasoning. In International Conference on Learning Representations.
- Transformers over Directed Acyclic Graphs. In Thirty-seventh Conference on Neural Information Processing Systems.
- LILA: A Unified Benchmark for Mathematical Reasoning. In Goldberg, Y.; Kozareva, Z.; and Zhang, Y., eds., Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 5807–5832. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
- Graph Representations for Higher-Order Logic and Theorem Proving. Proceedings of the AAAI Conference on Artificial Intelligence, 34(03): 2967–2974. Number: 03.
- Peano: learning formal mathematical reasoning. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381(2251): 20220044.
- Formal Mathematics Statement Curriculum Learning. In The Eleventh International Conference on Learning Representations.
- Generative Language Modeling for Automated Theorem Proving. ArXiv:2009.03393.
- Powell, W. B. 2022. Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions. John Wiley & Sons. ISBN 978-1-119-81505-1. Google-Books-ID: 6ahsEAAAQBAJ.
- Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book. ISBN 978-0-262-03924-6.
- The verified CakeML compiler backend. Journal of Functional Programming, 29: e2. Publisher: Cambridge University Press.
- Attention is All you Need. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Premise Selection for Theorem Proving by Deep Graph Embedding. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Whalen, D. 2016. Holophrasm: a neural Automated Theorem Prover for higher-order logic. ArXiv:1608.02644.
- TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning. In NeurIPS, volume 34.
- INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving. In ICLR.
- Yadan, O. 2019. Hydra - A framework for elegantly configuring complex applications. Software available at https://github.com/facebookresearch/hydra.
- Learning to prove theorems via interacting with proof assistants. In 36th International Conference on Machine Learning, ICML 2019, 36th International Conference on Machine Learning, ICML 2019, 12079–12094. International Machine Learning Society (IMLS). 36th International Conference on Machine Learning, ICML 2019 ; Conference date: 09-06-2019 Through 15-06-2019.
- LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. In Neural Information Processing Systems (NeurIPS).
- Graph convolutional networks: a comprehensive review. Computational Social Networks, 6(1): 11.
- miniF2F: a cross-system benchmark for formal Olympiad-level mathematics. In International Conference on Learning Representations.