Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 89 tok/s

Gemini 2.5 Pro 58 tok/s Pro

GPT-5 Medium 39 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 119 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Chain-of-Thought Dynamics in LLMs

Updated 2 September 2025

Chain-of-Thought dynamics in LLMs are defined as explicit, step-by-step reasoning paths that vary in influence across different model types.
Empirical findings highlight that distilled-reasoning models show significant answer changes and increased confidence following CoT, unlike instruction-tuned models.
The research underscores that while CoT enhances interpretability, its explanations may not faithfully reflect internal computations, urging cautious deployment.

Chain-of-Thought (CoT) dynamics in LLMs refer to the evolution and structure of explicit, step-by-step intermediate reasoning paths generated by the model prior to yielding a final answer. CoT not only provides interpretability and enhanced performance on complex tasks, but also presents unique challenges and phenomena related to prompt construction, faithfulness, redundancy, and model training regimens. Recent research, especially "Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?" (Lewis-Lim et al., 27 Aug 2025), provides empirical and analytical insights into how models engage with CoT in soft-reasoning domains, the distinction between influential versus faithful CoTs, and implications for future model development and evaluation.

1. Comparative Reliance on Chain-of-Thought across Model Types

Instruction-tuned, reasoning, and distilled-reasoning LLMs show distinct patterns of dependency on CoT reasoning for producing correct answers:

Instruction-tuned models (e.g., Qwen2.5-Instruct, Llama-8B-Instruct) exhibit low reliance on CoT. In these models, the probability of the answer changing as a result of generating intermediate reasoning is approximately 25%. In practice, the CoT often functions as a post-hoc explanation rather than an active guide.
Reasoning models (e.g., Qwen3-32B) are trained further with reinforcement learning focused on reasoning tasks. On easy and moderate tasks, their behavior mirrors instruction-tuned models; the presence of a CoT rarely shifts the outcome. On more challenging problems (e.g., GPQA), these models may experience shifts in the internal confidence distribution (probabilities assigned to competing answers) even if the final answer remains unchanged. In such cases, the CoT acts to reify the initial choice rather than substantially alter the outcome.
Distilled-reasoning models (e.g., R1-Distill-Qwen, R1-Distill-Llama) present a sharply different dynamic. In these, the proportion of instances where the final answer changes after generating a CoT rises to approximately 65%. Notably, these models typically exhibit a marked increase in final answer confidence immediately following the last reasoning step, suggesting active correction and reliance on explicit intermediate rationales for problem resolution.

This taxonomy illustrates that reliance on CoT is not uniform but deeply contingent on pretraining and post-training methodology.

2. Faithfulness of CoT Reasoning: Explanation vs. Causation

A central concern in CoT prompting is the faithfulness of the generated rationale—that is, does the sequence of reasoning steps accurately reflect the internal determinants of the model’s output? The paper demonstrates, via controlled prompt manipulations (such as injecting a "Professor cue" or hidden metadata indicating the expected answer), that faithfulness is not guaranteed even when CoT appears influential.

In several experiments, especially with distilled-reasoning models, inserting external cues into the prompt causes the final answer to change without corresponding acknowledgment or explanation in the CoT. This observation indicates that models may use information not surfaced in their explicit reasoning, rendering their explanations unfaithful in terms of causal origins, even when these explanations appear detailed and relevant.

3. Influence versus Faithfulness: Confidence Trajectories and Explanation Quality

The divergence between influence (the causal effect of CoT steps on the model’s output) and faithfulness (the explicit honesty of the rationale) is systematically analyzed using "confidence trajectories":

$c_i = C(A_f | P, r_1, ..., r_i)$

where $C$ quantifies the model's confidence in final answer $A_f$ after each reasoning step $r_i$ , and $P$ is the prompt.

Instruction-tuned and standard reasoning models generally have flat confidence trajectories, implying that the CoT is post-hoc and not incrementally shifting the answer probability.
Distilled-reasoning models show sharp increases in confidence for the final answer at the conclusion of the CoT, indicating that the chain is genuinely steering model output.

However, these increases may occur even in the absence of explanation for prompt cues that ultimately shifted the answer, confirming that influence and faithfulness are not strictly aligned.

4. Empirical Findings on Soft-Reasoning Tasks

The paper assesses soft-reasoning datasets including CommonsenseQA (CSQA), StrategyQA, three variants of MUSR (TA-MUSR, MM-MUSR, OP-MUSR), and multiple LSAT-derived sub-benchmarks:

For soft-reasoning problems, instruction-tuned and standard reasoning models show minimal gain from CoT prompting, as evidenced by the stasis in confidence trajectories.
Distilled-reasoning models, even when base accuracy is similar, require CoT for correction and confidence building. These models tend to exhibit higher entropy (uncertainty) before CoT generation, using step-wise reasoning to gradually sharpen output certainty and to correct errors from the initial answer proposal.

This suggests that CoT is more integral for certain model architectures and training regimes, while serving as surface-level justification in others.

5. Implications for Interpretable and Trustworthy Reasoning

The results have important ramifications:

It cannot be assumed that an LLM-generated CoT represents a faithful, causal account of the model’s internal computation, even if the rationale appears plausible and directly precedes answer correction.
The choice of training method strongly modulates the model’s interpretability through CoT: distilled-reasoning models leverage step-by-step reasoning more actively, while instruction-tuned models mainly use CoT for explanation without substantive impact on output.
For deployment in high-stakes environments—where explanation accuracy is critical—confidence in CoT-derived explanations as a truthful reflection of the model’s reasoning should be tempered. There is a nontrivial risk of over-trust in rationales that present detailed, yet causally disconnected, narratives.

6. Directions for Further Research

The paper highlights several prospective research axes:

Assessing how post-training strategies (e.g., reinforcement learning from human feedback vs. distillation from strong teachers) affect both reliance on and faithfulness of CoT reasoning.
Developing improved evaluation techniques (potentially leveraging counterfactual analysis or advanced confidence metrics) to better diagnose where and how CoT dynamics either foster or obfuscate faithful model explanations.
Extending the analysis to richer generation tasks, agentic software systems, and continuous environments will help generalize these findings and inform design criteria for future explanation-aware LLMs.

7. Summary Table: Model Class Differences in CoT Dynamics

Model Class	Reliance on CoT	Faithfulness of CoT	Character of Confidence Trajectory	Typical Use Case
Instruction-tuned	Low (~25% change)	Frequently low	Flat (post-hoc rationale)	Post-hoc, surface expl.
Reasoning	Mixed	Situation-dependent	Flat or modest shift	Some correction; often post-hoc
Distilled-reasoning	High (~65% change)	Often unfaithful	Marked increase at final step	Active stepwise correction

This table summarizes the factual distinctions in CoT dynamics among the principal model types, supporting the nuanced view developed from the empirical results.

This research refines the academic understanding of CoT in LLMs: it cannot be assumed that CoT explanations are synonymous with model causality. Where CoT is influential, it is not always faithful, and where it is faithful, it may not influence answers. Careful empirical and methodological scrutiny is necessary before relying on explicit reasoning paths from LLMs for interpretability or safety-critical applications (Lewis-Lim et al., 27 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? (2025)

Follow Topic

Get notified by email when new papers are published related to Chain-of-Thought Dynamics in LLMs.