Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning (2402.13950v4)
Abstract: LLMs have been shown to perform better when asked to reason step-by-step before answering a question. However, it is unclear to what degree the model's final answer is faithful to the stated reasoning steps. In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer. To address this issue, we introduce FRODO, a framework to tailor small-sized LMs to generate correct reasoning steps and robustly reason over these steps. FRODO consists of an inference module that learns to generate correct reasoning steps using an implicit causal reward function and a reasoning module that learns to faithfully reason over these intermediate inferences using a counterfactual and causal preference objective. Our experiments show that FRODO significantly outperforms four competitive baselines. Furthermore, FRODO improves the robustness and generalization ability of the reasoning LM, yielding higher performance on out-of-distribution test sets. Finally, we find that FRODO's rationales are more faithful to its final answer predictions than standard supervised fine-tuning.
- Gpt-4 technical report.
- Rl4f: Generating natural language feedback with reinforcement learning for repairing model outputs. In Annual Meeting of the Association for Computational Linguistics.
- Faithfulness tests for natural language explanations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 283–294, Toronto, Canada. Association for Computational Linguistics.
- Ralph Allan Bradley and Milton E. Terry. 1952. Rank analysis of incomplete block designs the method of paired comparisons. Biometrika, 39:324–345.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Language models are few-shot learners. ArXiv, abs/2005.14165.
- e-snli: Natural language inference with natural language explanations. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 9539–9549. Curran Associates, Inc.
- Causal intervention and counterfactual reasoning for multi-modal fake news detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 627–638, Toronto, Canada. Association for Computational Linguistics.
- Instructeval: Towards holistic evaluation of instruction-tuned large language models. arXiv preprint arXiv:2306.04757.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Scaling instruction-finetuned language models.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Bridging the gap: A survey on integrating (human) feedback for natural language generation. arXiv preprint arXiv:2305.00955.
- Causal analysis of syntactic agreement mechanisms in neural language models. ArXiv, abs/2106.06087.
- Specializing smaller language models towards multi-step reasoning. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 10421–10430. PMLR.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
- Leakage-adjusted simulatability: Can models generate non-trivial explanations of their behavior in natural language? In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4351–4367, Online. Association for Computational Linguistics.
- Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards reasoning in large language models: A survey.
- Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205, Online. Association for Computational Linguistics.
- Mistral 7b. ArXiv, abs/2310.06825.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673.
- Qasc: A dataset for question answering via sentence composition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8082–8090.
- Language models can solve computer tasks. ArXiv, abs/2303.17491.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213.
- Measuring faithfulness in chain-of-thought reasoning. ArXiv, abs/2307.13702.
- Hector J Levesque. 1986. Knowledge representation and reasoning. Annual review of computer science, 1(1):255–287.
- Symbolic chain-of-thought distillation: Small models can also “think” step-by-step. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2665–2679, Toronto, Canada. Association for Computational Linguistics.
- Explanations from large language models make small reasoners better. ArXiv, abs/2210.06726.
- Rainier: Reinforced knowledge introspector for commonsense question answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8938–8958, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Crystal: Introspective reasoners reinforced with self-feedback. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11557–11572, Singapore. Association for Computational Linguistics.
- Faithful chain-of-thought reasoning. arXiv preprint arXiv:2301.13379.
- Self-refine: Iterative refinement with self-feedback.
- Flirt: Feedback loop in-context red teaming. ArXiv, abs/2308.04265.
- Can a suit of armor conduct electricity? a new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2381–2391, Brussels, Belgium. Association for Computational Linguistics.
- OpenAI. 2023. Gpt-4 technical report.
- Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. ArXiv, abs/2308.03188.
- Letitia Parcalabescu and Anette Frank. 2023. On measuring faithfulness of natural language explanations. arXiv preprint arXiv:2311.07466.
- Debjit Paul and Anette Frank. 2021. COINS: Dynamically generating COntextualized inference rules for narrative story completion. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5086–5099, Online. Association for Computational Linguistics.
- Refiner: Reasoning feedback on intermediate representations. In EACL, volume abs/2304.01904.
- Judea Pearl. 1998. Graphs, causality, and structural equation models. Sociological Methods & Research, 27:226 – 284.
- Judea Pearl. 2001. Direct and indirect effects. Probabilistic and Causal Inference.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. ArXiv, abs/2302.12813.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Tailoring self-rationalizers with multi-reward distillation. In The Twelfth International Conference on Learning Representations.
- Neal J. Roese. 1997. Counterfactual thinking. Psychological bulletin, 121 1:133–48.
- Gauri Sharma. 2023. Discovering safety issues in text-to-image models: Insights from adversarial nibbler challenge. In Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI, pages 43–48, Bali, Indonesia. Association for Computational Linguistics.
- Reflexion: Language agents with verbal reinforcement learning.
- Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073, Toronto, Canada. Association for Computational Linguistics.
- A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7035–7052, Singapore. Association for Computational Linguistics.
- Recitation-augmented language models. In The Eleventh International Conference on Learning Representations.
- Challenging BIG-bench tasks and whether chain-of-thought can solve them. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13003–13051, Toronto, Canada. Association for Computational Linguistics.
- Quarel: A dataset and models for answering questions about qualitative relationships. In AAAI Conference on Artificial Intelligence.
- Graham Tierney and Alexander Volfovsky. 2021. Sensitivity analysis for causal mediation through text: an application to political polarization. In Proceedings of the First Workshop on Causal Inference and NLP, pages 61–73, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
- Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. ArXiv, abs/2305.04388.
- Investigating gender bias in language models using causal mediation analysis. In Advances in Neural Information Processing Systems, volume 33, pages 12388–12401. Curran Associates, Inc.
- SCOTT: Self-consistent chain-of-thought distillation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5546–5558, Toronto, Canada. Association for Computational Linguistics.
- Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903.
- Measuring association between labels and free-text rationales. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10266–10284, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Chain-of-thought in neural code generation: From and for lightweight language models. ArXiv, abs/2312.05562.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
- How language model hallucinations can snowball.
- Debjit Paul (18 papers)
- Robert West (154 papers)
- Antoine Bosselut (85 papers)
- Boi Faltings (76 papers)