Chain-of-Thought Explanation
- Chain-of-Thought explanation is a reasoning paradigm where models explicitly generate sequential steps as an expanding scratchpad for complex tasks.
- It contrasts fixed-size memory systems like compressed loops (CLL) with token-level memory growth, offering unbounded expressivity and adaptability.
- Empirical studies show that CoT enables models to simulate P-time computations and dynamically adjust to tasks requiring extensive working-memory.
Chain-of-thought (CoT) explanation is a reasoning paradigm in which a model, typically a LLM or a transformer-based architecture, is prompted or trained to explicitly generate a sequence of intermediate reasoning steps—each constituting a step in its decision process—before arriving at a final answer. CoT explanations serve as an externalized “scratchpad” or memory, revealing the model’s reasoning trajectory and often enabling modular, transparent, and more powerful multi-step inference.
1. Formal Definitions and Memory Regimes
The mechanistic underpinnings of CoT are best clarified by comparison with recurrent (looped) transformer architectures, as detailed in (Zhang, 29 May 2026). Three memory regimes at inference time can be formalized:
- Compressed Latent Loop (CLL[s,d,p,T]): Reasoning state is maintained in a fixed number of slots , updated recurrently but with persistent memory of size bits. The memory is independent of the number of reasoning steps or CoT length .
- Full Sequence-State Loop (SSL[d,p,T]): The recurrent state is a sequence of length- vectors, so bits. Persistent memory grows linearly with input size, allowing reasoning “scratchpad” to be in size.
- Chain-of-Thought Scratchpad (CoT[d,p,]): The model emits tokens as its scratchpad, each appended as a new context token. Persistent memory is 0 bits, and crucially grows without bound during inference as 1 increases.
CLL models remain fundamentally “small-space” reasoners; for any given slot budget 2, no increase in the number of looped reasoning steps 3 augments the effective size of their memory. In contrast, growing the CoT scratchpad via token generation yields unbounded memory accumulation at inference, an expressivity regime provably beyond the reach of fixed-sized loops.
2. Computational Expressivity and Complexity Separation
Memory bounds and complexity: The theoretical expressivity of these regimes is established by simulating their workspace in Turing machine terms. Any 4 can be simulated with 5 work tape. For 6 and 7, this restricts the architecture to 8, i.e., sub-polynomial space.
Under the standard complexity assumption 9, it follows that compressed loops cannot decide 0-complete languages under log-space reductions. In contrast, polynomial-length CoT (i.e., generating 1 tokens) lifts the memory regime to 2 bits. Results from Li et al. and Merrill & Sabharwal prove that such architectures can simulate arbitrary 3-time algorithms, i.e., their reasoning power subsumes all of 4 (Zhang, 29 May 2026).
3. Empirical Probes: Memory Budget vs. Working-Memory Demand
To validate the theoretical separation, (Zhang, 29 May 2026) conducts targeted experiments:
- Pointer-Chasing: In a functional graph pointer-chasing task (5 nodes, 6 chains), CLL models only succeed if 7, i.e., persistent slot budget equals working-memory demand. For 8, accuracy collapses to near zero. Sequence-state loop models, with 9-length persistent memory, succeed across the whole range.
- Associative Recall: In a key-value retrieval task, performance of gated linear-attention loops transitions sharply from failure to perfect recall as the size of the recurrent state 0 exceeds a threshold 1 that scales with 2, the number of key-value pairs. This mirrors the theoretical threshold effect derived from memory-budget constraints.
These results demonstrate that increasing the number of reasoning steps in a small-memory CLL does not confer additional “reasoning capacity” unless persistent memory scales to match task demand. In CoT, persistent memory dynamically grows with 3 to accommodate arbitrary complexity.
| Regime | Persistent Memory | Scales With Steps? | Capability |
|---|---|---|---|
| CLL | 4 | No | Bounded-space |
| SSL | 5 | Yes (linear in 6) | Memory-rich (P) |
| CoT | 7 | Yes (linear in 8) | Memory-unbounded (P) |
4. Mechanistic Role: CoT as a Growing Scratchpad
The essential difference revealed by (Zhang, 29 May 2026) is that CoT explanation transforms the autoregressive decoder into an external memory whose capacity is proportional to the number of generated tokens. Each explicit reasoning step is “written down” as a token that persists across all subsequent reasoning, forming a dynamic, append-only scratchpad. The looped transformer (with fixed-size latent state) is restricted to reusing a small workspace, recycling the same 9 slots at each step without any growth.
This mechanistic insight underpins the use of CoT for simulation of processes that require serial memory updates or tracking of multiple partial results. In practice, for 0 partial results, CoT can simply generate 1 steps, each storing one fact, whereas a CLL requires 2 slots.
5. Architecture, Faithfulness, and Robustness
CoT explanation is most naturally realized as explicit token-level generation (standard autoregressive LLM decoding), but its utility depends on the faithfulness of the generated intermediate steps to the model’s internal reasoning. Diagnostic tools such as the Hypocrisy Gap (Shiromani et al., 14 Jan 2026) use sparse autoencoder probes to measure divergence between internal “belief” and CoT explanations, highlighting the risk of post-hoc rationalization. Empirical studies show that CoT rationales can either act as faithful windows into computation or as opportunistic justifications, depending on generation regime and external pressures.
From the architecture side, the contrast between small-space CLLs and scratchpad-based CoT decoders is sharp: the former can never accumulate more information than the fixed number of slots, no matter how long the computation, while the latter accumulates context memory linearly with each output step, enabling simulation of any polynomial-space computation.
6. Practical Implications and Theoretical Synthesis
The memory-budget separation between CoT and compressed loops compels several practical and theoretical conclusions:
- Use CoT for unbounded serial reasoning: Whenever task structure or required working-memory demand exceed fixed recurrent budgets (e.g., dynamic pointer tracking, arbitrary-length computation), only CoT can scale up reasoning capacity at inference.
- Memory-rich regimes (SSL, full-sequence attention) close the gap: Full sequence-state loops that store a hidden vector at every input position operate closer to the “explicit scratchpad” regime and can approach CoT-level expressivity for certain tasks, at higher computational cost.
- Compression trade-offs: Compressing reasoning steps into compact latent states provides significant speedups but incurs the risk of hitting expressivity and sample efficiency limits, as demonstrated by the order-3 barrier and signal decay effects in CoT compression frameworks (Li et al., 29 Jan 2026).
- Selection of architecture depends on reasoning depth, seriality, and working-memory requirements: Fixed-space reasoners may suffice for bounded-memory tasks, but CoT prompting is essential when demand grows with input or process scale.
7. Schematic Summary
Diagrammatically, the two memory paradigms are distinguished by their state trajectories:
- Compressed loop:
4 (fixed 5 slots; memory does not grow)
- CoT scratchpad:
6 (memory grows with each 7)
This separation provides both a theoretical and empirical foundation for the widespread effectiveness of chain-of-thought explanation as an inference-time mechanism for expanding model reasoning capacity beyond static parameter limits (Zhang, 29 May 2026).