Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LeanReasoner: Boosting Complex Logical Reasoning with Lean (2403.13312v1)

Published 20 Mar 2024 in cs.CL
LeanReasoner: Boosting Complex Logical Reasoning with Lean

Abstract: LLMs often struggle with complex logical reasoning due to logical inconsistencies and the inherent difficulty of such reasoning. We use Lean, a theorem proving framework, to address these challenges. By formalizing logical reasoning problems into theorems within Lean, we can solve them by proving or disproving the corresponding theorems. This method reduces the risk of logical inconsistencies with the help of Lean's symbolic solver. It also enhances our ability to treat complex reasoning tasks by using Lean's extensive library of theorem proofs. Our method achieves state-of-the-art performance on the FOLIO dataset and achieves performance near this level on ProofWriter. Notably, these results were accomplished by fine-tuning on fewer than 100 in-domain samples for each dataset.

Enhancing Logical Reasoning in AI with Lean: Introducing LeanReasoner

Overview of LeanReasoner

LeanReasoner is a novel framework designed to improve the performance of LLMs on complex logical reasoning tasks. By integrating Lean, a theorem proving framework, LeanReasoner formalizes logical reasoning problems into theorems and attempts to solve them by proving or disproving these theorems. The incorporation of Lean's symbolic solver significantly reduces the risk of logical inconsistencies and enhances the ability to manage intricate reasoning tasks. Through this method, LeanReasoner achieves state-of-the-art performance on the FOLIO dataset and near-state-of-the-art performance on ProofWriter, even with fewer than 100 in-domain samples for fine-tuning on each dataset.

Key Components of LeanReasoner

LeanReasoner encompasses four primary components:

  • The Formalizer: Utilizes OpenAI models (GPT-3 and GPT-4) to convert natural language contexts into formalized Lean theorems. It acts as the interface between natural language inputs and the symbolic world of theorem proving.
  • Tactic Generator: Employs ReProver model, leveraging retrieval mechanisms and generative tactics to construct proofs based on the provided formulation.
  • Proof Search Mechanism: Oversees the selection of tactics and manages the proof construction process, resulting in a proof tree that evolves toward proving the theorem.
  • Result Interpreter: Analyzes the output from the proof search to determine the correct answer among the provided options.

Experimental Setup and Results

The LeanReasoner framework was evaluated using two logical reasoning datasets: ProofWriter and FOLIO. The experiments involved fine-tuning a customized model using a modest amount of domain-specific annotation.

  • ProofWriter: LeanReasoner demonstrated state-of-the-art performance, successfully leveraging the rigidity of Lean's symbolic solver to navigate the dataset's logical complexities. The approach's efficiency is underscored by its high accuracy achieved with minimal in-domain samples for fine-tuning.
  • FOLIO: The framework accomplished near-state-of-the-art performance, a notable achievement given FOLIO's more complex logical structure and intricate linguistic constructs. Its success on FOLIO highlights its capability in tackling advanced logical reasoning challenges.

Implications and Speculation on Future Developments

LeanReasoner's introduction marks a significant advancement in combining symbolic solvers with LLMs for logical reasoning. It demonstrates the potential of using theorem provers like Lean to fortify the logical reasoning capabilities of LLMs, ensuring outputs that adhere strictly to logical rules.

This research's implications extend beyond merely enhancing model performance on reasoning tasks. It suggests a promising direction for future AI development, where the fusion of symbolic reasoning and natural language understanding can lead to more reliable, logically consistent AI systems.

Looking ahead, further exploration into the integration of different symbolic solvers, optimizing the formalization process, and scaling the approach to accommodate a broader range of logical reasoning tasks appear to be promising avenues. Additionally, investigating the impact of training LLMs on datasets specifically tailored for theorem proving could further enhance their reasoning faculties, potentially leading to breakthroughs in AI's logical reasoning capabilities.

In conclusion, LeanReasoner's approach heralds a new era in logical reasoning in AI, blending the structured reasoning of symbolic solvers with the flexible understanding of LLMs. Its success on challenging datasets underscores the robustness of this method, offering a glimpse into the future of AI research in logical reasoning and theorem proving.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Fabian Gloeckle Baptiste Rozière, Jonas Gehring and et.al. 2023. Code llama: Open foundation models for code. CoRR, abs/2308.12950.
  2. Jianshu Chen. 2023. Learning language representations with logical inductive bias. In ICLR. OpenReview.net.
  3. Theoremqa: A theorem-driven question answering dataset.
  4. RECKONING: reasoning through dynamic knowledge encoding. CoRR, abs/2305.06349.
  5. Training verifiers to solve math word problems.
  6. Antonia Creswell and Murray Shanahan. 2022. Faithful reasoning using large language models. CoRR, abs/2208.14271.
  7. Language models show human-like content effects on reasoning. CoRR.
  8. Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: an efficient SMT solver. In TACAS.
  9. The lean theorem prover (system description). In CADE-2.
  10. Hao Fu, Yao; Peng and Tushar Khot. 2022. How does gpt obtain its ability? tracing emergent abilities of language models to their sources. Yao Fu’s Notion.
  11. Complexity-based prompting for multi-step reasoning. In ICLR. OpenReview.net.
  12. Does entity abstraction help generative transformers reason? Trans. Mach. Learn. Res., 2022.
  13. Proof artifact co-training for theorem proving with language models. In ICLR.
  14. FOLIO: natural language reasoning with first-order logic. CoRR.
  15. Solving math word problems by combining language models with symbolic solvers.
  16. Measuring massive multitask language understanding.
  17. Measuring Mathematical Problem Solving With the MATH Dataset. In NeurIPS, Menlo Park, Calif. AAAI Press.
  18. Dense passage retrieval for open-domain question answering. In EMNLP.
  19. Llms as factual reasoners: Insights from existing benchmarks and beyond.
  20. Hypertree proof search for neural theorem proving. In NeurIPS.
  21. Solving quantitative reasoning problems with language models. In NeurIPS.
  22. LINC: A neurosymbolic approach for logical reasoning by combining language models with first-order logic provers. In EMNLP.
  23. OpenAI. 2023. GPT-4 technical report. CoRR.
  24. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning.
  25. Certified reasoning with language models. CoRR.
  26. Formal mathematics statement curriculum learning. In ICLR.
  27. Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. CoRR, abs/2009.03393.
  28. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. In ICLR. OpenReview.net.
  29. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.
  30. Byt5: Towards a token-free future with pre-trained byte-to-byte models. Trans. Assoc. Comput. Linguistics, 10.
  31. Generating natural language proofs with verifier-guided search. In EMNLP.
  32. Leandojo: Theorem proving with retrieval-augmented language models.
  33. Satisfiability-aided language models using declarative prompting.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Dongwei Jiang (16 papers)
  2. Marcio Fonseca (5 papers)
  3. Shay B. Cohen (78 papers)
Citations (6)