Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving (2404.07382v3)

Published 10 Apr 2024 in cs.AI and cs.LO

Abstract: Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) LLM that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Anonymous. 2024. Boosting of thoughts: Trial-and-error problem solving with large language models. In The Twelfth International Conference on Learning Representations.
  2. Michael D Atkinson and J-R Sack. 1992. Generating binary trees at random. Information Processing Letters, 41(1):21–23.
  3. Learning to reason in large theories without imitation. arXiv preprint arXiv:1905.10501.
  4. The Coq proof assistant reference manual: Version 6.1. Ph.D. thesis, Inria.
  5. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
  6. Chin-Liang Chang and Richard Char-Tung Lee. 2014. Symbolic logic and mechanical theorem proving. Academic press.
  7. A survey of chain of thought reasoning: Advances, frontiers and future. arXiv preprint arXiv:2309.15402.
  8. William F Clocksin and Christopher S Mellish. 2003. Programming in PROLOG. Springer Science & Business Media.
  9. Daniel Crevier. 1993. AI: the tumultuous history of the search for artificial intelligence. Basic Books, Inc.
  10. Advancing mathematics by guiding human intuition with AI. Nature, 600(7887):70–74.
  11. Martin Davis. 2001. The early history of automated deduction. Handbook of automated reasoning, 1:3–15.
  12. The lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings 25, pages 378–388. Springer.
  13. Baldur: Whole-proof generation and repair with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1229–1241.
  14. Complexity-based prompting for multi-step reasoning. arXiv preprint arXiv:2210.00720.
  15. P. C. Gilmore. 1960. A proof method for quantification theory: Its justification and realization. IBM Journal of Research and Development, 4(1):28–35.
  16. Proof artifact co-training for theorem proving with language models. arXiv preprint arXiv:2102.06203.
  17. Edvard K Holden and Konstantin Korovin. 2021. Heterogeneous heuristic optimisation and scheduling for first-order theorem proving. In International Conference on Intelligent Computer Mathematics, pages 107–123. Springer.
  18. Deepmath-deep sequence models for premise selection. Advances in neural information processing systems, 29.
  19. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. arXiv preprint arXiv:2210.12283.
  20. Automated theorem proving in intuitionistic propositional logic by deep reinforcement learning. arXiv preprint arXiv:1811.00796.
  21. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626.
  22. Hypertree proof search for neural theorem proving. Advances in Neural Information Processing Systems, 35:26337–26349.
  23. Chuck C. Liang and Dale Miller. 2009. Focusing and polarization in linear, intuitionistic, and classical logics. Theor. Comput. Sci., 410(46):4747–4768.
  24. Pamela McCorduck. 2004. Machines Who Think (2Nd Ed.). A. K. Peters.
  25. Sean McLaughlin and Frank Pfenning. 2009. Efficient intuitionistic theorem proving with the polarized inverse method. In Automated Deduction - CADE-22, 22nd International Conference on Automated Deduction, Montreal, Canada, August 2-7, 2009. Proceedings, volume 5663 of Lecture Notes in Computer Science, pages 230–244. Springer.
  26. Norman Megill and David A Wheeler. 2019. Metamath: a computer language for mathematical proofs. Lulu. com.
  27. Magnushammer: A transformer-based approach to premise selection. arXiv preprint arXiv:2303.04488.
  28. A survey on theorem provers in formal methods. arXiv preprint arXiv:1912.03028.
  29. Isabelle/HOL: a proof assistant for higher-order logic. Springer.
  30. Frank Pfenning. 2017. Lecture notes on focusing.
  31. Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393.
  32. Mathematical reasoning via self-supervised skip-tree training. arXiv preprint arXiv:2006.04757.
  33. John Alan Robinson. 1965. A machine-oriented logic based on the resolution principle. Journal of the ACM (JACM), 12(1):23–41.
  34. Stuart J Russell and Peter Norvig. 2010. Artificial intelligence a modern approach. London.
  35. Towards proof synthesis guided by neural machine translation for intuitionistic propositional logic. corr abs/1706.06462 (2017). arXiv preprint arXiv:1706.06462.
  36. Taro Sekiyama and Kohei Suenaga. 2018. Automated proof synthesis for propositional logic with deep neural networks. arXiv preprint arXiv:1805.11799.
  37. Richard P Stanley. 2015. Catalan numbers. Cambridge University Press.
  38. A language-agent approach to formal theorem-proving. arXiv preprint arXiv:2310.04353.
  39. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  40. Adam Zsolt Wagner. 2021. Constructions in combinatorics via neural networks. arXiv preprint arXiv:2104.14516.
  41. Dt-solver: Automated theorem proving with dynamic-tree sampling guided by proof-level value function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12632–12646.
  42. Mingzhe Wang and Jia Deng. 2020. Learning to prove theorems by learning to generate theorems. Advances in Neural Information Processing Systems, 33:18146–18157.
  43. Premise selection for theorem proving by deep graph embedding. Advances in neural information processing systems, 30.
  44. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  45. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  46. Naturalproofs: Mathematical theorem proving in natural language. arXiv preprint arXiv:2104.01112.
  47. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  48. Tacticzero: Learning to prove theorems from scratch with deep reinforcement learning. Advances in Neural Information Processing Systems, 34:9330–9342.
  49. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. arXiv preprint arXiv:2306.15626.
  50. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  51. Nature language reasoning, a survey. arXiv preprint arXiv:2303.14725.
  52. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Chenyang An (6 papers)
  2. Zhibo Chen (176 papers)
  3. Qihao Ye (4 papers)
  4. Emily First (8 papers)
  5. Letian Peng (23 papers)
  6. Jiayun Zhang (11 papers)
  7. Zihan Wang (181 papers)
  8. Sorin Lerner (16 papers)
  9. Jingbo Shang (141 papers)
Citations (3)