Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 97 tok/s

Gemini 2.5 Pro 44 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 100 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Kimi K2 186 tok/s Pro

2000 character limit reached

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? (2508.19827v1)

Published 27 Aug 2025 in cs.AI and cs.CL

Abstract: Recent work has demonstrated that Chain-of-Thought (CoT) often yields limited gains for soft-reasoning problems such as analytical and commonsense reasoning. CoT can also be unfaithful to a model's actual reasoning. We investigate the dynamics and faithfulness of CoT in soft-reasoning tasks across instruction-tuned, reasoning and reasoning-distilled models. Our findings reveal differences in how these models rely on CoT, and show that CoT influence and faithfulness are not always aligned.

Collections

Summary

The paper demonstrates that distilled-reasoning models significantly rely on chain-of-thought processing, revising initial answers in 65% of cases.
The methodology uses confidence trajectory analysis and cue injection to disentangle causal influence from mere post-hoc rationalizations.
The findings highlight that training regimes, such as distillation versus RLHF, critically affect both the faithfulness and effectiveness of CoT explanations.

Chain-of-Thought Dynamics in LLMs: Influence, Faithfulness, and Model-Specific Reasoning

Introduction

This paper presents a systematic analysis of Chain-of-Thought (CoT) reasoning in LLMs on soft-reasoning tasks, focusing on two central questions: (1) whether CoT actively guides model predictions or merely serves as a post-hoc rationalization, and (2) to what extent CoT explanations are faithful to the model's internal decision process. The paper compares instruction-tuned, multi-step reasoning, and distilled-reasoning LLMs, employing confidence trajectory analysis and targeted cue-injection to disentangle the causal influence and faithfulness of CoT traces.

Methodology

Model Families and Datasets

The analysis covers three LLM families:

Instruction-tuned models: e.g., Qwen2.5-Instruct, Llama-8B-Instruct, trained with supervised fine-tuning and RLHF.
Multi-step reasoning models: e.g., Qwen3-32B, QwQ-32B, further trained for explicit multi-step reasoning via RL.
Distilled-reasoning models: e.g., R1-Distill-Qwen, R1-Distill-Llama, distilled from stronger reasoning LLMs.

Evaluation is conducted on multiple-choice soft-reasoning datasets where prior work has shown limited or negative CoT gains: CommonsenseQA, StrategyQA, MUSR, LSAT, and GPQA.

Confidence Trajectory Analysis

For each model, the probability assigned to the final answer is tracked after each CoT step, yielding a confidence trajectory. This approach quantifies the extent to which intermediate reasoning steps causally influence the model's prediction, as opposed to simply justifying a pre-formed answer.

Faithfulness via Cue Injection

Faithfulness is probed by injecting misleading cues (e.g., a "Stanford Professor" suggestion or metadata indicating a specific answer) into the prompt. If the model's answer changes to match the cue but the CoT omits any mention of the cue, the explanation is deemed unfaithful. Faithfulness is operationalized as explicit verbalization of cue usage in the CoT, classified using GPT-4.1.

Empirical Findings

CoT Influence: Model-Specific Dynamics

Distilled-reasoning models exhibit a high rate of answer revision after CoT generation (65% of cases), substantially exceeding both instruction-tuned (25%) and reasoning models (24%). These revisions frequently correct initial mistakes, indicating that CoT is not merely decorative but essential for these models' performance.
Instruction-tuned and reasoning models rarely change their initial answer after CoT, and their final accuracy is often comparable to distilled models. This suggests that, for these models, CoT often functions as a post-hoc rationalization rather than an active reasoning process.
Entropy analysis reveals that distilled models start with higher initial uncertainty, relying on CoT to resolve ambiguity, whereas instruction-tuned models are more decisive from the outset.

Confidence Trajectories

Instruction-tuned models display flat confidence trajectories on most tasks, indicating that CoT steps do not meaningfully alter the model's belief in its answer. On more challenging tasks (e.g., GPQA), some dynamic but generally ineffective confidence shifts are observed.
Distilled-reasoning models show pronounced increases in answer confidence during CoT, often culminating in a sharp rise at the final step. This pattern supports the claim that CoT is causally necessary for these models' predictions.
Reasoning models (e.g., Qwen3-32B) exhibit mixed behavior: flat trajectories on easier tasks (suggesting post-hoc rationalization), but more dynamic confidence changes on harder tasks or in models like QwQ-32B.

Faithfulness: Disentangling Causal Influence and Explanation

Unfaithful but influential CoTs: In distilled models, unfaithful CoTs (those that do not acknowledge the cue) can still guide the model toward the cued answer, as evidenced by increasing confidence trajectories. This demonstrates that causal influence and faithfulness are not aligned.
Faithful but non-influential CoTs: In reasoning models, even when the CoT explicitly acknowledges the cue, the confidence trajectory may remain flat, indicating that the explanation is faithful but not causally influential.
Faithfulness scores (proportion of cue-induced answer changes with explicit verbalization) are generally low for instruction-tuned models and higher for reasoning and some distilled models, but with substantial variation across datasets and cue types.

Theoretical and Practical Implications

Redefining Faithfulness

The findings challenge the adequacy of faithfulness definitions based solely on causal dependence or explicit verbalization. A CoT can be unfaithful yet causally influential, or faithful yet epiphenomenal. This disconnect complicates the use of CoT explanations for model interpretability and reliability, especially in high-stakes or agentic applications.

Model Training and Reasoning Behavior

The heavy reliance of distilled-reasoning models on CoT is hypothesized to result from their exposure to procedural knowledge during distillation, in contrast to the RLHF-driven preference for human-like explanations in instruction-tuned models. This suggests that post-training strategies (distillation vs. RLHF) fundamentally shape not only model performance but also the nature and faithfulness of their reasoning traces.

Evaluation and Deployment

For practitioners, the results indicate that:

Distilled-reasoning models may be preferable when stepwise, causally effective reasoning is required, but their CoT explanations may still be unfaithful.
Instruction-tuned models can achieve high accuracy without reliance on CoT, but their explanations are often post-hoc and uninformative about the true decision process.
Faithfulness metrics based on explicit verbalization or causal perturbation should be interpreted with caution, as they may not capture the full spectrum of model reasoning dynamics.

Future Directions

Generalization to open-ended and agentic tasks: Extending the analysis to long-form generation, planning, and software engineering tasks is necessary to assess the robustness of these findings beyond multiple-choice settings.
Interventions for improved faithfulness: Investigating training or inference-time interventions (e.g., causal attribution, activation steering) to align CoT explanations with actual model reasoning.
Deeper mechanistic interpretability: Combining confidence trajectory analysis with model-internal probes (e.g., attention, activation attributions) to more directly link CoT traces to underlying computation.

Conclusion

This paper provides a nuanced characterization of CoT dynamics in LLMs, demonstrating that the causal influence and faithfulness of CoT explanations are not necessarily aligned and are strongly modulated by model training regime. Distilled-reasoning models depend on CoT for performance, while instruction-tuned and reasoning models often use CoT as a post-hoc justification. The results underscore the need for more sophisticated metrics and interventions to ensure that CoT explanations are both causally effective and faithful, with significant implications for the deployment and trustworthiness of LLMs in real-world applications.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (4)

Tweets

https://twitter.com/HuggingPapers/status/1961221597320171948

alphaXiv

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? (19 likes, 0 questions)