Conditional and Modal Reasoning in Large Language Models (2401.17169v4)
Abstract: The reasoning abilities of LLMs are the topic of a growing body of research in AI and cognitive science. In this paper, we probe the extent to which twenty-nine LLMs are able to distinguish logically correct inferences from logically fallacious ones. We focus on inference patterns involving conditionals (e.g., 'If Ann has a queen, then Bob has a jack') and epistemic modals (e.g., 'Ann might have an ace', 'Bob must have a king'). These inferences have been of special interest to logicians, philosophers, and linguists, since they play a central role in the fundamental human ability to reason about distal possibilities. Assessing LLMs on these inferences is thus highly relevant to the question of how much the reasoning abilities of LLMs match those of humans. All the LLMs we tested make some basic mistakes with conditionals or modals, though zero-shot chain-of-thought prompting helps them make fewer mistakes. Even the best performing LLMs make basic errors in modal reasoning, display logically inconsistent judgments across inference patterns involving epistemic modals and conditionals, and give answers about complex conditional inferences that do not match reported human judgments. These results highlight gaps in basic logical reasoning in today's LLMs.
- A. Beller and T. Gerstenberg. A counterfactual simulation model of causal language, July 2023. https://doi.org/10.31234/osf.io/xv8hf.
- J. van Benthem. The logic of conditionals on outback trails. Logic Journal of the IGPL, 31(6):1135–1152, Dec 2023.
- BIG-bench authors. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, 2023.
- A large annotated corpus for learning natural language inference, 2015. arXiv:1508.05326 [cs.CL].
- Learning to teach large language models logical reasoning. 2023. arXiv:2310.09158 [cs.AI].
- Transformers as soft reasoners over language. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI), pages 3882–3890. IJCAI, 2021.
- Entailment, intensionality and text understanding. In Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning, pages 38–45, 2003.
- Language models show human-like content effects on reasoning tasks. 2023. arXiv:2207.07051 [cs.CL].
- The probabilities of conditionals revisited. Cognitive Science, 37(4):711–730, 2013.
- Dorothy Edgington. On conditionals. Mind, 104(414):235–329, 1995.
- The Logic of Conditionals. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Fall 2021 edition, 2021.
- Reasoning about Knowledge. MIT Press, Cambridge, Mass., 1995.
- N. Friedman and J. Y. Halpern. On the complexity of conditional logics. In J. Doyle, E. Sandwell, and P. Torasso, editors, Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International Conference (KR’94), pages 202–213, San Francisco, CA, 1994. Morgan Kaufmann.
- James Garson. Modal logic. In Edward N. Zalta and Uri Nodelman, editors, The Stanford Encyclopedia of Philosophy, Spring 2024 Edition. 2024.
- Counterfactuals and two kinds of expected utility. In William L. Harper, Robert Stalnaker, and Glenn Pearce, editors, Ifs: Conditionals, Beliefs, Decision, Chance, and Time, pages 153–192. D. Reidel Publishing Company, 1981.
- Coreference and modality. In Shalom Lappin, editor, Handbook of Contemporary Semantic Theory, pages 179–216. Oxford, Blackwell, 1996.
- Teaching temporal logics to neural networks. In International Conference on Learning Representations, 2021.
- Folio: Natural language reasoning with first-order logic, 2022. arXiv:2209.00840 [cs.CL].
- Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey, 2023. arXiv:2212.10403 [cs.CL].
- Justin Khoo. The Meaning of If. Oxford University Press, 2022.
- Connectives without truth-tables. Natural Language Semantics, 20:137–175, 2012.
- Angelika Kratzer. The notional category of modality. In H. Eikmeyer and H. Rieser, editors, Words, Worlds, and Contexts: New Approaches in Word Semantics, pages 38–74. de Gruyter, 1981.
- Angelika Kratzer. Modals and Conditionals. Oxford University Press, 2012.
- S. A. Kripke. Semantical analysis of modal logic I. Normal modal propositional calculi. Zeitschrift für Mathematische Logik und Grundlagen der Mathematik, 9:67–96, 1963.
- David Lewis. Causation. The Journal of Philosophy, 70(17):556–567, 1973.
- David Lewis. Counterfactuals. Basil Blackwell, Oxford, 1973.
- LogiQA: A challenge dataset for machine reading comprehension with logical reasoning, 2020. arXiv:2007.08124 [cs.CL].
- Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy, July 2019. Association for Computational Linguistics.
- Vann McGee. A counterexample to modus ponens. The Journal of Philosophy, 82(9):462–471, 1985.
- LogicInference: A new dataset for teaching logical inference to seq2seq models, 2022. arXiv:2203.15099 [cs.AI].
- Inherent Disagreements in Human Textual Inferences. Transactions of the Association for Computational Linguistics, 7:677–694, 11 2019.
- Paul Portner. Modality. Oxford University Press, 2009.
- The possible, the plausible, and the desirable: Event-based modality detection for language processing. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 953–965, Online, August 2021. Association for Computational Linguistics.
- Beyond accuracy: Behavioral testing of NLP models with CheckList. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4902–4912, Online, July 2020. Association for Computational Linguistics.
- Abulhair Saparov and He He. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought, 2023. arXiv:2210.01240 [cs.CL].
- Testing the general deductive reasoning capacity of large language models using OOD examples, 2023. arXiv:2305.15269 [cs.CL].
- A semantic analysis of conditional logic. Theoria, 36(1):23–42, 1970.
- Robert C. Stalnaker. A theory of conditionals. In Nicholas Rescher, editor, Studies in Logical Theory, pages 98–112. Blackwell, 1968.
- Alfred Tarski. Über den Begriff der logischen Folgerung. Aces du Congrès International de Philosophie Scientifique, fasc. 7:1–11, 1936.
- Diagnosing the first-order logical reasoning ability through LogicNLI. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3738–3747, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- Frank Veltman. Logics for Conditionals. PhD thesis, University of Amsterdam, 1985.
- A & B == B & A: Triggering logical reasoning failures in large language models, 2024. arXiv:2401.00757 [cs.SE].
- Are language models worse than humans at following prompts? it’s complicated, 2023.
- Emergent abilities of large language models, 2022. arXiv:2206.07682 [cs.CL].
- Chain-of-thought prompting elicits reasoning in large language models, 2023. arXiv:2201.11903 [cs.CL].
- Are large language models really good logical reasoners? A comprehensive evaluation and beyond. 2023. arXiv:2306.09841 [cs.CL].
- Seth Yalcin. A counterexample to modus tollens. Journal of Philosophical Logic, 41:1001–1024, 2012.
- A survey of large language models, 2023. arXiv:2303.18223 [cs.CL].
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.