Dissociation of Faithful and Unfaithful Reasoning in LLMs (2405.15092v2)
Abstract: LLMs often improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. We investigate how LLMs recover from errors in Chain of Thought. Through analysis of error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, which occurs when models arrive at the correct answer despite invalid reasoning text. We identify factors that shift LLM recovery behavior: LLMs recover more frequently from obvious errors and in contexts that provide more evidence for the correct answer. Critically, these factors have divergent effects on faithful and unfaithful recoveries. Our results indicate that there are distinct mechanisms driving faithful and unfaithful error recoveries. Selective targeting of these mechanisms may be able to drive down the rate of unfaithful reasoning and improve model interpretability.
- Faithfulness vs. plausibility: On the (un)reliability of explanations from large language models, 2024.
- Opt-r: Exploring the role of explanations in finetuning and prompting for reasoning skills of large language models. In Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE). Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.nlrse-1.10. URL http://dx.doi.org/10.18653/v1/2023.nlrse-1.10.
- Why exposure bias matters: An imitation learning perspective of error accumulation in language generation. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 700–710, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.58. URL https://aclanthology.org/2022.findings-acl.58.
- Llms with chain-of-thought are non-causal reasoners, 2024.
- Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
- Revealing the structure of language model capabilities, 2023.
- Language model behavior: A comprehensive survey, 2023.
- Training verifiers to solve math word problems, 2021.
- How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning, 2024.
- Faith and fate: Limits of transformers on compositionality, 2023.
- Function calling and other api updates, Jul 2023. URL https://openai.com/blog/function-calling-and-other-api-updates.
- Towards revealing the mystery behind chain of thought: A theoretical perspective, 2023.
- Leo Gao. Shapley value attribution in chain of thought, Apr 2023. URL https://www.lesswrong.com/posts/FX5JmftqL2j6K8dn4/shapley-value-attribution-in-chain-of-thought.
- An automatically discovered chain-of-thought prompt generalizes to novel models and datasets, 2023.
- Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4198–4205, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.386. URL https://aclanthology.org/2020.acl-main.386.
- Large language models are zero-shot reasoners. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=e2TBb5y0yFf.
- MAWPS: A math word problem repository. In Kevin Knight, Ani Nenkova, and Owen Rambow (eds.), Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1152–1157, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1136. URL https://aclanthology.org/N16-1136.
- Measuring faithfulness in chain-of-thought reasoning, 2023.
- Text and patterns: For effective chain of thought, it takes two to tango, 2022.
- Sources of hallucination by large language models on inference tasks, 2023.
- A diverse corpus for evaluating and developing english math word problem solvers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 975–984, 2020.
- Direct evaluation of chain-of-thought in multi-hop reasoning with knowledge graphs, 2024.
- OpenAI. Gpt-4 technical report, 2023.
- Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2080–2094, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.168. URL https://aclanthology.org/2021.naacl-main.168.
- Visual chain of thought: Bridging logical gaps with multimodal infillings, 2023.
- Tim Shallice. From neuropsychology to mental structure. Cambridge University Press, 1988.
- Language models are multilingual chain-of-thought reasoners, 2022.
- Challenging big-bench tasks and whether chain-of-thought can solve them, 2022.
- Endel Tulving. Episodic and semantic memory. Organization of memory, 1(381-403):1, 1972.
- Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting, 2023.
- Towards understanding chain-of-thought prompting: An empirical study of what matters. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2717–2739, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.153. URL https://aclanthology.org/2023.acl-long.153.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Analyzing chain-of-thought prompting in large language models via gradient-based feature attributions, 2023.
- Natural language reasoning, a survey, 2023.
- Multimodal chain-of-thought reasoning in language models, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.