Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic (2402.14798v3)
Abstract: Recent LLMs enable new opportunities for structured reasoning with text, such as the construction of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited performance gains by modern neuro-symbolic engines. To address these problems, we formulate a consistent and theoretically grounded approach to annotating decompositional entailment and evaluate its impact on LLM-based textual inference. We find that our new dataset, RDTE (Recognizing Decompositional Textual Entailment), has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets. We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality, illustrating the practical benefit of this advance for textual inference.
- Gabor Angeli and Christopher D. Manning. 2014. NaturalLI: Natural logic inference for common sense reasoning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 534–545. Association for Computational Linguistics.
- J Anthony Blair. 2012. Relevance, acceptability and sufficiency today. Groundwork in the Theory of Argumentation: Selected Papers of J. Anthony Blair, pages 87–100.
- Natural language deduction through search over statement compositions. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4871–4883. Association for Computational Linguistics.
- A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642. Association for Computational Linguistics.
- Uncertain natural language inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8772–8779. Association for Computational Linguistics.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.
- BaRDa: A belief and reasoning dataset that separates factual accuracy and reasoning ability. arXiv preprint arXiv:2312.07527.
- Selection-inference: Exploiting large language models for interpretable logical reasoning. In The Eleventh International Conference on Learning Representations.
- The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop.
- Explaining answers with entailment trees. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7358–7370. Association for Computational Linguistics.
- Transforming question answering datasets into natural language inference datasets. arXiv preprint arXiv:1809.02922.
- Ambifc: Fact-checking ambiguous claims with evidence. Transactions of the Association for Computational Linguistics, 12:1–18.
- Leo Groarke. 2022. Informal Logic. In Edward N. Zalta and Uri Nodelman, editors, The Stanford Encyclopedia of Philosophy, winter 2022 edition. Metaphysics Research Lab, Stanford University.
- Capturing the varieties of natural language inference: A systematic survey of existing datasets and two novel benchmarks. Journal of Logic, Language and Information, pages 1–28.
- Logical fallacy detection. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7180–7198. Association for Computational Linguistics.
- Ralph H. Johnson and J. Anthony Blair. 1977. Logical self-defense.
- Qasc: A dataset for question answering via sentence composition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8082–8090.
- Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pages 2356–2362.
- Christopher D Manning. 2006. Local textual inference: it’s hard to circumscribe, but you know it when you see it–and nlp needs it.
- Raquel Mochales Palau and Marie-Francine Moens. 2009. Argumentation mining: the detection, classification and structure of arguments in text. In Proceedings of the 12th International Conference on Artificial Intelligence and Law, ICAIL ’09, page 98–107, New York, NY, USA. Association for Computing Machinery.
- Ellie Pavlick and Tom Kwiatkowski. 2019. Inherent disagreements in human textual inferences. Transactions of the Association for Computational Linguistics, 7:677–694.
- Nils Reimers and Iryna Gurevych. 2019a. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992. Association for Computational Linguistics.
- Nils Reimers and Iryna Gurevych. 2019b. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK. In The Eleventh International Conference on Learning Representations.
- Generating summaries with controllable readability levels. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11669–11687. Association for Computational Linguistics.
- Okapi at trec-3. Nist Special Publication Sp, 109:109.
- Christian Stab and Iryna Gurevych. 2017. Recognizing insufficiently supported arguments in argumentative essays. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 980–990. Association for Computational Linguistics.
- Explanations in the wild. Cognition, 237.
- Entailer: Answering questions with faithful and truthful chains of reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2078–2093. Association for Computational Linguistics.
- Chenhao Tan. 2022. On the diversity and limits of human explanations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2173–2188. Association for Computational Linguistics.
- Do natural language explanations represent valid logical arguments? verifying entailment in explainable NLI gold standards. In Proceedings of the 14th International Conference on Computational Semantics (IWCS), pages 76–86. Association for Computational Linguistics.
- NELLIE: A neuro-symbolic inference engine for grounded, compositional, and explainable reasoning.
- Sarah Wiegreffe and Ana Marasovic. 2021. Teach me to explain: A review of datasets for explainable natural language processing. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
- WorldTree v2: A corpus of science-domain structured explanations and inference patterns supporting multi-hop inference. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5456–5473. European Language Resources Association.
- Generating natural language proofs with verifier-guided search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 89–105. Association for Computational Linguistics.
- Ordinal common-sense inference. Transactions of the Association for Computational Linguistics, 5:379–395.
- Nathaniel Weir (17 papers)
- Kate Sanders (19 papers)
- Orion Weller (31 papers)
- Shreya Sharma (11 papers)
- Dongwei Jiang (16 papers)
- Bhavana Dalvi Mishra (26 papers)
- Oyvind Tafjord (49 papers)
- Peter Jansen (22 papers)
- Peter Clark (108 papers)
- Benjamin Van Durme (173 papers)
- Zhengping Jiang (19 papers)