Papers
Topics
Authors
Recent
Search
2000 character limit reached

Chain-of-Thought Explanation

Updated 3 June 2026
  • Chain-of-Thought explanation is a reasoning paradigm where models explicitly generate sequential steps as an expanding scratchpad for complex tasks.
  • It contrasts fixed-size memory systems like compressed loops (CLL) with token-level memory growth, offering unbounded expressivity and adaptability.
  • Empirical studies show that CoT enables models to simulate P-time computations and dynamically adjust to tasks requiring extensive working-memory.

Chain-of-thought (CoT) explanation is a reasoning paradigm in which a model, typically a LLM or a transformer-based architecture, is prompted or trained to explicitly generate a sequence of intermediate reasoning steps—each constituting a step in its decision process—before arriving at a final answer. CoT explanations serve as an externalized “scratchpad” or memory, revealing the model’s reasoning trajectory and often enabling modular, transparent, and more powerful multi-step inference.

1. Formal Definitions and Memory Regimes

The mechanistic underpinnings of CoT are best clarified by comparison with recurrent (looped) transformer architectures, as detailed in (Zhang, 29 May 2026). Three memory regimes at inference time can be formalized:

  • Compressed Latent Loop (CLL[s,d,p,T]): Reasoning state is maintained in a fixed number of slots zt(Fpd)sz_t \in (\mathbb{F}_p^d)^s, updated recurrently but with persistent memory of size MCLL=sdpM_\text{CLL} = s\cdot d\cdot p bits. The memory is independent of the number of reasoning steps TT or CoT length \ell.
  • Full Sequence-State Loop (SSL[d,p,T]): The recurrent state is a sequence Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n of length-nn vectors, so MSSL=ndpM_\text{SSL} = n\cdot d\cdot p bits. Persistent memory grows linearly with input size, allowing reasoning “scratchpad” to be Θ(n)\Theta(n) in size.
  • Chain-of-Thought Scratchpad (CoT[d,p,\ell]): The model emits \ell tokens as its scratchpad, each appended as a new context token. Persistent memory is MCLL=sdpM_\text{CLL} = s\cdot d\cdot p0 bits, and crucially grows without bound during inference as MCLL=sdpM_\text{CLL} = s\cdot d\cdot p1 increases.

CLL models remain fundamentally “small-space” reasoners; for any given slot budget MCLL=sdpM_\text{CLL} = s\cdot d\cdot p2, no increase in the number of looped reasoning steps MCLL=sdpM_\text{CLL} = s\cdot d\cdot p3 augments the effective size of their memory. In contrast, growing the CoT scratchpad via token generation yields unbounded memory accumulation at inference, an expressivity regime provably beyond the reach of fixed-sized loops.

2. Computational Expressivity and Complexity Separation

Memory bounds and complexity: The theoretical expressivity of these regimes is established by simulating their workspace in Turing machine terms. Any MCLL=sdpM_\text{CLL} = s\cdot d\cdot p4 can be simulated with MCLL=sdpM_\text{CLL} = s\cdot d\cdot p5 work tape. For MCLL=sdpM_\text{CLL} = s\cdot d\cdot p6 and MCLL=sdpM_\text{CLL} = s\cdot d\cdot p7, this restricts the architecture to MCLL=sdpM_\text{CLL} = s\cdot d\cdot p8, i.e., sub-polynomial space.

Under the standard complexity assumption MCLL=sdpM_\text{CLL} = s\cdot d\cdot p9, it follows that compressed loops cannot decide TT0-complete languages under log-space reductions. In contrast, polynomial-length CoT (i.e., generating TT1 tokens) lifts the memory regime to TT2 bits. Results from Li et al. and Merrill & Sabharwal prove that such architectures can simulate arbitrary TT3-time algorithms, i.e., their reasoning power subsumes all of TT4 (Zhang, 29 May 2026).

3. Empirical Probes: Memory Budget vs. Working-Memory Demand

To validate the theoretical separation, (Zhang, 29 May 2026) conducts targeted experiments:

  • Pointer-Chasing: In a functional graph pointer-chasing task (TT5 nodes, TT6 chains), CLL models only succeed if TT7, i.e., persistent slot budget equals working-memory demand. For TT8, accuracy collapses to near zero. Sequence-state loop models, with TT9-length persistent memory, succeed across the whole range.
  • Associative Recall: In a key-value retrieval task, performance of gated linear-attention loops transitions sharply from failure to perfect recall as the size of the recurrent state \ell0 exceeds a threshold \ell1 that scales with \ell2, the number of key-value pairs. This mirrors the theoretical threshold effect derived from memory-budget constraints.

These results demonstrate that increasing the number of reasoning steps in a small-memory CLL does not confer additional “reasoning capacity” unless persistent memory scales to match task demand. In CoT, persistent memory dynamically grows with \ell3 to accommodate arbitrary complexity.

Regime Persistent Memory Scales With Steps? Capability
CLL \ell4 No Bounded-space
SSL \ell5 Yes (linear in \ell6) Memory-rich (P)
CoT \ell7 Yes (linear in \ell8) Memory-unbounded (P)

4. Mechanistic Role: CoT as a Growing Scratchpad

The essential difference revealed by (Zhang, 29 May 2026) is that CoT explanation transforms the autoregressive decoder into an external memory whose capacity is proportional to the number of generated tokens. Each explicit reasoning step is “written down” as a token that persists across all subsequent reasoning, forming a dynamic, append-only scratchpad. The looped transformer (with fixed-size latent state) is restricted to reusing a small workspace, recycling the same \ell9 slots at each step without any growth.

This mechanistic insight underpins the use of CoT for simulation of processes that require serial memory updates or tracking of multiple partial results. In practice, for Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n0 partial results, CoT can simply generate Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n1 steps, each storing one fact, whereas a CLL requires Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n2 slots.

5. Architecture, Faithfulness, and Robustness

CoT explanation is most naturally realized as explicit token-level generation (standard autoregressive LLM decoding), but its utility depends on the faithfulness of the generated intermediate steps to the model’s internal reasoning. Diagnostic tools such as the Hypocrisy Gap (Shiromani et al., 14 Jan 2026) use sparse autoencoder probes to measure divergence between internal “belief” and CoT explanations, highlighting the risk of post-hoc rationalization. Empirical studies show that CoT rationales can either act as faithful windows into computation or as opportunistic justifications, depending on generation regime and external pressures.

From the architecture side, the contrast between small-space CLLs and scratchpad-based CoT decoders is sharp: the former can never accumulate more information than the fixed number of slots, no matter how long the computation, while the latter accumulates context memory linearly with each output step, enabling simulation of any polynomial-space computation.

6. Practical Implications and Theoretical Synthesis

The memory-budget separation between CoT and compressed loops compels several practical and theoretical conclusions:

  • Use CoT for unbounded serial reasoning: Whenever task structure or required working-memory demand exceed fixed recurrent budgets (e.g., dynamic pointer tracking, arbitrary-length computation), only CoT can scale up reasoning capacity at inference.
  • Memory-rich regimes (SSL, full-sequence attention) close the gap: Full sequence-state loops that store a hidden vector at every input position operate closer to the “explicit scratchpad” regime and can approach CoT-level expressivity for certain tasks, at higher computational cost.
  • Compression trade-offs: Compressing reasoning steps into compact latent states provides significant speedups but incurs the risk of hitting expressivity and sample efficiency limits, as demonstrated by the order-Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n3 barrier and signal decay effects in CoT compression frameworks (Li et al., 29 Jan 2026).
  • Selection of architecture depends on reasoning depth, seriality, and working-memory requirements: Fixed-space reasoners may suffice for bounded-memory tasks, but CoT prompting is essential when demand grows with input or process scale.

7. Schematic Summary

Diagrammatically, the two memory paradigms are distinguished by their state trajectories:

  • Compressed loop:

Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n4 (fixed Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n5 slots; memory does not grow)

  • CoT scratchpad:

Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n6 (memory grows with each Ht(Fpd)nH_t \in (\mathbb{F}_p^d)^n7)

This separation provides both a theoretical and empirical foundation for the widespread effectiveness of chain-of-thought explanation as an inference-time mechanism for expanding model reasoning capacity beyond static parameter limits (Zhang, 29 May 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Chain-of-Thought Explanation.