Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? (2508.19827v1)
Abstract: Recent work has demonstrated that Chain-of-Thought (CoT) often yields limited gains for soft-reasoning problems such as analytical and commonsense reasoning. CoT can also be unfaithful to a model's actual reasoning. We investigate the dynamics and faithfulness of CoT in soft-reasoning tasks across instruction-tuned, reasoning and reasoning-distilled models. Our findings reveal differences in how these models rely on CoT, and show that CoT influence and faithfulness are not always aligned.
Collections
Sign up for free to add this paper to one or more collections.
Summary
- The paper demonstrates that distilled-reasoning models significantly rely on chain-of-thought processing, revising initial answers in 65% of cases.
- The methodology uses confidence trajectory analysis and cue injection to disentangle causal influence from mere post-hoc rationalizations.
- The findings highlight that training regimes, such as distillation versus RLHF, critically affect both the faithfulness and effectiveness of CoT explanations.
Chain-of-Thought Dynamics in LLMs: Influence, Faithfulness, and Model-Specific Reasoning
Introduction
This paper presents a systematic analysis of Chain-of-Thought (CoT) reasoning in LLMs on soft-reasoning tasks, focusing on two central questions: (1) whether CoT actively guides model predictions or merely serves as a post-hoc rationalization, and (2) to what extent CoT explanations are faithful to the model's internal decision process. The paper compares instruction-tuned, multi-step reasoning, and distilled-reasoning LLMs, employing confidence trajectory analysis and targeted cue-injection to disentangle the causal influence and faithfulness of CoT traces.
Methodology
Model Families and Datasets
The analysis covers three LLM families:
- Instruction-tuned models: e.g., Qwen2.5-Instruct, Llama-8B-Instruct, trained with supervised fine-tuning and RLHF.
- Multi-step reasoning models: e.g., Qwen3-32B, QwQ-32B, further trained for explicit multi-step reasoning via RL.
- Distilled-reasoning models: e.g., R1-Distill-Qwen, R1-Distill-Llama, distilled from stronger reasoning LLMs.
Evaluation is conducted on multiple-choice soft-reasoning datasets where prior work has shown limited or negative CoT gains: CommonsenseQA, StrategyQA, MUSR, LSAT, and GPQA.
Confidence Trajectory Analysis
For each model, the probability assigned to the final answer is tracked after each CoT step, yielding a confidence trajectory. This approach quantifies the extent to which intermediate reasoning steps causally influence the model's prediction, as opposed to simply justifying a pre-formed answer.
Faithfulness via Cue Injection
Faithfulness is probed by injecting misleading cues (e.g., a "Stanford Professor" suggestion or metadata indicating a specific answer) into the prompt. If the model's answer changes to match the cue but the CoT omits any mention of the cue, the explanation is deemed unfaithful. Faithfulness is operationalized as explicit verbalization of cue usage in the CoT, classified using GPT-4.1.
Empirical Findings
CoT Influence: Model-Specific Dynamics
- Distilled-reasoning models exhibit a high rate of answer revision after CoT generation (65% of cases), substantially exceeding both instruction-tuned (25%) and reasoning models (24%). These revisions frequently correct initial mistakes, indicating that CoT is not merely decorative but essential for these models' performance.
- Instruction-tuned and reasoning models rarely change their initial answer after CoT, and their final accuracy is often comparable to distilled models. This suggests that, for these models, CoT often functions as a post-hoc rationalization rather than an active reasoning process.
- Entropy analysis reveals that distilled models start with higher initial uncertainty, relying on CoT to resolve ambiguity, whereas instruction-tuned models are more decisive from the outset.
Confidence Trajectories
- Instruction-tuned models display flat confidence trajectories on most tasks, indicating that CoT steps do not meaningfully alter the model's belief in its answer. On more challenging tasks (e.g., GPQA), some dynamic but generally ineffective confidence shifts are observed.
- Distilled-reasoning models show pronounced increases in answer confidence during CoT, often culminating in a sharp rise at the final step. This pattern supports the claim that CoT is causally necessary for these models' predictions.
- Reasoning models (e.g., Qwen3-32B) exhibit mixed behavior: flat trajectories on easier tasks (suggesting post-hoc rationalization), but more dynamic confidence changes on harder tasks or in models like QwQ-32B.
Faithfulness: Disentangling Causal Influence and Explanation
- Unfaithful but influential CoTs: In distilled models, unfaithful CoTs (those that do not acknowledge the cue) can still guide the model toward the cued answer, as evidenced by increasing confidence trajectories. This demonstrates that causal influence and faithfulness are not aligned.
- Faithful but non-influential CoTs: In reasoning models, even when the CoT explicitly acknowledges the cue, the confidence trajectory may remain flat, indicating that the explanation is faithful but not causally influential.
- Faithfulness scores (proportion of cue-induced answer changes with explicit verbalization) are generally low for instruction-tuned models and higher for reasoning and some distilled models, but with substantial variation across datasets and cue types.
Theoretical and Practical Implications
Redefining Faithfulness
The findings challenge the adequacy of faithfulness definitions based solely on causal dependence or explicit verbalization. A CoT can be unfaithful yet causally influential, or faithful yet epiphenomenal. This disconnect complicates the use of CoT explanations for model interpretability and reliability, especially in high-stakes or agentic applications.
Model Training and Reasoning Behavior
The heavy reliance of distilled-reasoning models on CoT is hypothesized to result from their exposure to procedural knowledge during distillation, in contrast to the RLHF-driven preference for human-like explanations in instruction-tuned models. This suggests that post-training strategies (distillation vs. RLHF) fundamentally shape not only model performance but also the nature and faithfulness of their reasoning traces.
Evaluation and Deployment
For practitioners, the results indicate that:
- Distilled-reasoning models may be preferable when stepwise, causally effective reasoning is required, but their CoT explanations may still be unfaithful.
- Instruction-tuned models can achieve high accuracy without reliance on CoT, but their explanations are often post-hoc and uninformative about the true decision process.
- Faithfulness metrics based on explicit verbalization or causal perturbation should be interpreted with caution, as they may not capture the full spectrum of model reasoning dynamics.
Future Directions
- Generalization to open-ended and agentic tasks: Extending the analysis to long-form generation, planning, and software engineering tasks is necessary to assess the robustness of these findings beyond multiple-choice settings.
- Interventions for improved faithfulness: Investigating training or inference-time interventions (e.g., causal attribution, activation steering) to align CoT explanations with actual model reasoning.
- Deeper mechanistic interpretability: Combining confidence trajectory analysis with model-internal probes (e.g., attention, activation attributions) to more directly link CoT traces to underlying computation.
Conclusion
This paper provides a nuanced characterization of CoT dynamics in LLMs, demonstrating that the causal influence and faithfulness of CoT explanations are not necessarily aligned and are strongly modulated by model training regime. Distilled-reasoning models depend on CoT for performance, while instruction-tuned and reasoning models often use CoT as a post-hoc justification. The results underscore the need for more sophisticated metrics and interventions to ensure that CoT explanations are both causally effective and faithful, with significant implications for the deployment and trustworthiness of LLMs in real-world applications.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Follow-up Questions
- How do confidence trajectory analyses in this paper enhance our understanding of CoT's causal role in LLM predictions?
- What distinguishes instruction-tuned models from distilled-reasoning models in terms of chain-of-thought behavior?
- How does targeted cue injection help measure the faithfulness of CoT explanations in various LLMs?
- What are the practical implications of unfaithful yet influential chain-of-thoughts for deploying LLMs in high-stakes scenarios?
- Find recent papers about chain-of-thought reasoning in LLMs.
Related Papers
- Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters (2022)
- Faithful Chain-of-Thought Reasoning (2023)
- Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting (2023)
- Measuring Faithfulness in Chain-of-Thought Reasoning (2023)
- How Likely Do LLMs with CoT Mimic Human Reasoning? (2024)
- On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models (2024)
- Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Arithmetic Reasoning (2024)
- Reasoning Models Don't Always Say What They Think (2025)
- Reasoning Models Better Express Their Confidence (2025)
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens (2025)
Authors (4)
Tweets
alphaXiv
- Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? (19 likes, 0 questions)