Faithful Chain-of-Thought Reasoning (2301.13379v3)

Published 31 Jan 2023 in cs.CL

Abstract: While Chain-of-Thought (CoT) prompting boosts LLMs' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a reasoning framework involving two stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning chain) and Problem Solving (reasoning chain $\rightarrow$ answer), using an LM and a deterministic solver respectively. This guarantees that the reasoning chain provides a faithful explanation of the final answer. Aside from interpretability, Faithful CoT also improves empirical performance: it outperforms standard CoT on 9 of 10 benchmarks from 4 diverse domains, with a relative accuracy gain of 6.3% on Math Word Problems (MWP), 3.4% on Planning, 5.5% on Multi-hop Question Answering (QA), and 21.4% on Relational Inference. Furthermore, with GPT-4 and Codex, it sets the new state-of-the-art few-shot performance on 7 datasets (with 95.0+ accuracy on 6 of them), showing a strong synergy between faithfulness and accuracy.

PDF Abstract

Faithful Chain-of-Thought Reasoning: An Overview

The concept of Chain-of-Thought (CoT) prompting has demonstrated significant performance improvements in LLMs across various complex reasoning tasks. However, the generated reasoning chains often lack faithfulness—meaning they do not accurately represent the internal decision-making process that leads to the final output. The paper "Faithful Chain-of-Thought Reasoning" introduces a two-stage framework, Faithful CoT, to address this issue. By separating the process into Translation and Problem Solving stages, the approach ensures that the reasoning chain truly reflects how the answer is obtained.

Methodology

Faithful CoT employs a two-step framework:

Translation:
- Converts a natural language query into a mixed natural and symbolic language reasoning chain using an LLM.
- Decomposes the query into simpler, dependent subproblems expressed in natural language.
- Transforms these subproblems into executable symbolic language programs (e.g., Python, Datalog).
Problem Solving:
- Uses deterministic solvers to execute the symbolic reasoning chain, providing a direct and faithful path to the final answer.

The LLM's role is primarily focused on generating a reasoning chain, which is then meticulously executed by deterministic solvers. This methodologically guarantees that every step in the reasoning chain causally contributes to deriving the final answer—thereby enhancing interpretability.

Performance Evaluation

Faithful CoT was tested on various datasets representing distinct domains: Math Word Problems (MWP), Multi-hop Question Answering (QA), Planning, and Relational Inference. Across these tasks, it showcased superior results, outperforming traditional CoT approaches in most scenarios. Notably, Faithful CoT achieved a strong accuracy improvement, particularly in more complex datasets such as GSM8K and CLUTRR.

Math Word Problems: Showed a consistent relative accuracy gain, with substantial improvements on complex datasets.
Multi-hop QA: Demonstrated enhanced performance on tasks requiring multi-step reasoning.
Planning: Improved accuracy by simplifying complex instructions into actionable goals.
Relational Inference: Notable gains in inferring complex, multi-step relational data.

Implications and Future Directions

The framework addresses the critical issue of faithfulness in LLM-generated reasoning chains, providing a path toward more interpretable, reliable, and trustable AI systems, particularly crucial in high-stakes decision-making applications. By ensuring that the reasoning reflects true task execution, it mitigates the risk of over-reliance on potentially misleading explanations.

The success of this framework opens avenues for further exploration in several areas:

Improving Translation Transparency: While Faithful CoT makes the reasoning process faithful, the initial translation stage remains opaque. Future work might explore visualization or external validation techniques to enhance transparency.
Expanding to Broader Tasks: The integration of more symbolic languages and solvers could broaden the application to a wider array of complex reasoning tasks.

Overall, Faithful CoT strengthens the synergy between performance and interpretability, demonstrating that enhancing model explainability need not compromise accuracy. This innovation contributes significantly to the field of AI reasoning, offering a robust methodology that balances complex task solving with genuine interpretability.