Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation (2404.01677v2)
Abstract: LLMs have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simple reasoning problems, which significantly decreases their generalization ability. To address this issue, we propose a novel framework, named Generalizable and Faithful Reasoner (GFaiR), which introduces the paradigm of resolution refutation. Resolution refutation has the capability to solve all first-order logic reasoning problems by extending reasoning rules and employing the principle of proof by contradiction, so our system's completeness can be improved by introducing resolution refutation. Experimental results demonstrate that our system outperforms previous works by achieving state-of-the-art performances in complex scenarios while maintaining performances in simple scenarios. Besides, we observe that GFaiR is faithful to its reasoning process.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
- Samuel R Buss. 1998. An introduction to proof theory. Handbook of proof theory, 137:1–78.
- Transformers as soft reasoners over language. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3882–3890.
- Antonia Creswell and Murray Shanahan. 2022. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271.
- Ernest Davis. 2017. Logical formalizations of commonsense reasoning: A survey. Journal of Artificial Intelligence Research, 59:651–723.
- ROSCOE: A suite of metrics for scoring step-by-step reasoning. In The Eleventh International Conference on Learning Representations.
- Folio: Natural language reasoning with first-order logic. arXiv preprint arXiv:2209.00840.
- Peter Hase and Mohit Bansal. 2020. Evaluating explainable ai: Which algorithmic explanations help users predict model behavior? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5540–5552.
- Lambada: Backward chaining for automated reasoning in natural language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6547–6568.
- Logiqa: a challenge dataset for machine reading comprehension with logical reasoning. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3622–3628.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Faithful chain-of-thought reasoning. arXiv preprint arXiv:2301.13379.
- Learning deductive reasoning from synthetic corpus based on formal logic. In Proceedings of the 40th International Conference on Machine Learning,, pages 25254–25274.
- A survey on theorem provers in formal methods. arXiv preprint arXiv:1912.03028.
- Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv preprint arXiv:2305.12295.
- Neural unification for logic reasoning over natural language. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3939–3950.
- Reasoning with language model prompting: A survey. arXiv preprint arXiv:2212.09597.
- Interpretable proof generation via iterative backward reasoning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2968–2981.
- STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK. In The Eleventh International Conference on Learning Representations.
- Kyle Richardson and Ashish Sabharwal. 2022. Pushing the limits of rule reasoning in transformers through natural language satisfiability. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11209–11219.
- John Alan Robinson. 1965. A machine-oriented logic based on the resolution principle. Journal of the ACM (JACM), 12(1):23–41.
- Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215.
- Stuart J Russell. 2010. Artificial intelligence a modern approach. Pearson Education, Inc.
- Prover: Proof generation for interpretable reasoning over rules. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 122–136.
- multiprover: Generating multiple proofs for improved interpretability in rule reasoning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3662–3677.
- Fairr: Faithful and robust deductive reasoning over natural language. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1075–1093.
- Can transformers reason in fragments of natural language? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11184–11199.
- Probabilistic graph reasoning for natural proof generation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3140–3151.
- Proofwriter: Generating implications, proofs, and abductive statements over natural language. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3621–3634.
- Diagnosing the first-order logical reasoning ability through logicnli. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3738–3747.
- Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, pages 24824–24837.
- Generating natural language proofs with verifier-guided search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 89–105.
- Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, page 5754–5764.
- Reclor: A reading comprehension dataset requiring logical reasoning. In International Conference on Learning Representations.
- On the paradox of learning to reason from data. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 3365–3373.
- Zhouhao Sun (8 papers)
- Xiao Ding (38 papers)
- Li Du (72 papers)
- Bibo Cai (3 papers)
- Jinglong Gao (9 papers)
- Ting Liu (329 papers)
- Qin Bing (3 papers)