Measuring Faithfulness in Chain-of-Thought Reasoning (2307.13702v1)

Published 17 Jul 2023 in cs.AI, cs.CL, and cs.LG

Abstract: LLMs perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.

PDF Abstract

Measuring Faithfulness in Chain-of-Thought Reasoning

The paper "Measuring Faithfulness in Chain-of-Thought Reasoning" presents a focused investigation into the faithfulness of reasoning produced by LLMs when using Chain-of-Thought (CoT) prompting. The authors critically examine the disconnect between the CoT generated explanations and the LLM's true reasoning processes. This exploration is particularly relevant given the increasing utilization of LLMs in decision-critical applications, where understanding the reasoning behind an output is crucial.

Key Hypotheses and Methodology

The paper explores several hypotheses concerning CoT's potential unfaithfulness:

Post-Hoc Reasoning: It is proposed that CoT might be generated after a conclusion is internally reached by the model, merely as a retroactive explanation.
Test-Time Computation: The authors hypothesize that the performance gain from CoT could simply result from the extra computation offered by the intervening tokens between the query and the final answer.
Encoded Reasoning: Another hypothesis suggests that LLMs might encode useful information within the CoT using subtle phrasing changes, which are not interpretable by human readers but enhance performance.

To test these hypotheses, several interventions on CoT were conducted:

Early Answering: Truncate the CoT to evaluate if incomplete reasoning changes the model's final answer.
Adding Mistakes: Introduce errors in CoT steps to see how it alters decision outcomes.
Filler Tokens: Replace CoT with meaningless text (e.g., ellipses) to determine if additional computation time alone is beneficial.
Paraphrasing: Reword the initial steps of CoT to identify reliance on specific phrasing structures.

Results and Implications

The results reveal substantial variation in the dependence on CoT across tasks. Notably, tasks like AQuA and LogiQA benefited significantly from faithful CoT involvement, whereas models on simpler tasks such as ARC Easy showed minimal interaction with CoT. Importantly, the common assumption that CoT processing assures improved model performance was challenged, evidencing that benefits are not uniform across different model capabilities or task complexities.

Further, the paper indicates an inverse scaling regarding CoT faithfulness; smaller models often exhibited more faithful reasoning than their larger counterparts for certain tasks, implying that as models become more capable, they might rely less on explicit reasoning chains.

Future Directions

The findings open several avenues for future research:

Development of training methodologies that foster faithful reasoning representations in LLMs.
Exploration of alternative frameworks beyond CoT that guarantee genuine interpretability without compromising the model’s accuracy.
Investigating whether other prompting techniques, such as subquestion generation or decision-tree logic, might inherently encourage the production of more faithful reasoning.

Conclusion

This work critically assesses the faithfulness of CoT reasoning, providing clarity on when and where CoT can be utilized effectively. By addressing key weaknesses in model-generated explanations, it contributes to enhancing the transparency and reliability of LLMs, particularly in high-stakes environments.

The implications stress the need for careful selection of model configurations and prompting strategies to ensure faithful AI reasoning, ultimately guiding the development of more trustworthy and interpretable AI systems.