Chain-of-Thought as Latent Variable

Updated 9 March 2026

The paper introduces latent variable frameworks for chain-of-thought by formalizing intermediate reasoning steps as manipulable random variables.
It demonstrates that compressed and latent token representations can maintain accuracy while reducing verbosity in multi-step tasks.
The approach reveals causal and computational trade-offs, offering design guidelines for robust, multimodal, and sequential reasoning.

Chain-of-thought (CoT) as a latent variable recasts intermediate reasoning steps in LLMs as formal, manipulable random variables or structured hidden states. Unlike classical approaches that externalize reasoning in discrete, observable token traces, latent variable CoT frameworks treat these steps—whether they manifest as token sequences or continuous states—as stochastic or deterministic objects within the model's probabilistic or computational graph. This perspective enables both theoretical analysis and empirical probing of reasoning dynamics, causal structure, information flow, and capacity limits. Below, key principles, models, and empirical findings are presented.

1. Probabilistic Formulation: Reasoning as Latent Variable Inference

The latent variable view frames the CoT process as the joint modeling of input $x$ , a sequence of latent intermediate steps $z = \langle z_1, \ldots, z_T \rangle$ , and a final answer $y$ , leading to a factorized distribution:

$P(y,z|x) = \prod_{t=1}^T P(z_t|x, z_{<t}) \cdot P(y|x, z)$

This captures both explicit tokenized CoT (where $z_t$ are text tokens) and internal, latent representations (where $z_t$ may be vectors or other abstract states). Marginalizing over $z$ yields the answer probability, $P(y|x) = \sum_z P(y,z|x)$ , which underlies training with weak or only final-answer supervision (Zhu et al., 8 May 2025).

In variations such as visual reasoning or multimodal CoTs, $z$ may also represent non-textual embeddings (e.g., visual sketches in $\mathbb{R}^d$ ), constructed and consumed by language or vision-LLMs (Shao et al., 31 Jan 2026). Probabilistic inference is realized through variational approaches (ELBO maximization), amortized inference policies, or MCMC-based EM algorithms that average over (or learn from) high-likelihood chains conditioned on observed outcomes (Phan et al., 2023, Sun et al., 27 Oct 2025, Tang et al., 25 Mar 2025).

2. Internalization, Compression, and Representation of CoT Variables

Latent variable CoT extends beyond the explicit emission of intermediate tokens to include compressed and internal representations:

Compressed CoT tokens: Only numerically or semantically essential tokens (e.g., intermediate results in arithmetic) are preserved, matching or even improving performance over verbose step traces. Tokens carrying no value (e.g., filler words) are superfluous for multi-step tasks (Zhu et al., 8 May 2025).
Latent tokenization: Sequences encoding intermediate results can be replaced by learned one-hot or distributed representations (e.g., a <LAT> token encoding a digit array or a compact vector summarizing an entire step). Models trained to consume and emit these latent forms maintain accuracy as long as the key variable information persists, demonstrating that the form (textual vs. latent) is less critical than the information content (Zhu et al., 8 May 2025).
Hybrid models: Mechanisms such as CoLT trade off between explicit reasoning and token bottlenecking by implementing "latent tool calls." Here, compressed latent surrogates for step-wise rationale are emitted, unpacked by small decoders, and re-injected as explicit text, yielding interpretability and efficiency (Zhu et al., 4 Feb 2026).
Continuous latent trajectories: In vision-language and continuous reasoning models, chains of hidden states (thought vectors) evolve iteratively, interleaving and aligning with multimodal input, and serving as the latent variables governing successive reasoning stages. This naturally supports high-entropy, information-rich intermediate states beyond word-level discretization (Shao et al., 31 Jan 2026, Pham et al., 18 Aug 2025).

3. Causal, Algorithmic, and Computational Properties

Viewing CoT as a sequence of variable states highlights their computational and causal roles:

Program-variable analogy: In compositional tasks (e.g., multiplication, dynamic programming), CoT tokens function as program variables—mutable slots storing and propagating intermediate results. Causal interventions, where these variables are overwritten mid-generation, reliably alter downstream reasoning and outcomes in ways consistent with causal semantics (≈74% success), confirming that these tokens are computationally functional and mutable (Zhu et al., 8 May 2025).
Information bottlenecks and complexity ceilings: When multiple reasoning substeps are compressed into one latent variable, linear probes are increasingly unable to recover the full variable content, and performance degrades. This empirically defines a computation-per-variable ceiling: each token or latent state can stably encode only a limited amount of complexity before fidelity collapses (Zhu et al., 8 May 2025).
Sequential and non-local propagation: In models such as Coconut and CODI, latent CoT steps exhibit both staged (sequential) and skip-connected (non-local) information pathways, in contrast to the strictly local, diagonal structure of explicit CoT. Not all steps are equally causal—"high-leverage" steps can have disproportionate impact, unlike depth-uniform residual blocks (Li et al., 9 Feb 2026).
Internal commitment and planning horizon: Linear probes reveal that LLMs may internally "decide" outcomes by a handful of early CoT steps (AUC ≈0.84 at $z = \langle z_1, \ldots, z_T \rangle$ 0 tokens), even if verbalized traces are much longer. However, probe analyses (Tele-Lens) generally reveal myopic local planning: most models do not internalize a global plan but rather rely on sequential, short-horizon updates with minimal final-answer prediction power until the output stage (David, 3 Nov 2025, Xu et al., 2 Feb 2026).

4. Learning Algorithms and Training Objectives

To support latent variable CoT reasoning, models employ a range of specialized learning algorithms:

Supervised fine-tuning: Dense supervision on intermediate CoT steps is effective but expensive; compressed and latent forms allow the omission of surface details.
Variational and amortized inference: Training maximizes the ELBO with respect to both generative and inference networks (e.g., GFlowNets, CVAEs). In practice, policies are trained to sample or reconstruct high-likelihood rationales under sparse or noisy reward signals (Phan et al., 2023, Wu et al., 10 Jul 2025, Sun et al., 27 Oct 2025).
Expectation-Maximization (EM) and control variates: For settings where rationales are unobserved, MCMC-EM (TRICE) alternates between sampling plausible chains under current policies and updating model parameters via importance-weighted gradients, with control-variate techniques reducing estimator variance as training converges (Phan et al., 2023).
Adaptive computation and halting: Some models support token-level, content-adaptive latent step allocation, where "easy" tokens get short or trivial latent CoT traces and "hard" tokens invoke additional latent computation. This enables per-token dynamic compute scheduling, balancing accuracy and compute cost (Zeng et al., 9 Feb 2026).
Search and inference-time rethinking: Decoupling reasoning (latent trajectory) from verbalization allows models to perform gradient-based refinement of latent plans at inference, iteratively improving solutions via gradient ascent or variational updates, outperforming much larger static models (Kong et al., 6 Feb 2026, Wang et al., 29 Jan 2026).

5. Theoretical Limits, Trade-offs, and Performance Analysis

Latent CoT systems expose both fundamental trade-offs and limits:

Exploration–execution trade-off: High decisional certainty in latent steps yields accurate execution (single-solution focus) but suppresses exploration; low certainty enhances diversity (multi-path exploration) but allows error accumulation. The "Symbolic Index" quantifies this commitment, governing the model's ability to balance search and precision (Zou et al., 1 Feb 2026).
Sample and approximation complexity: Statistical learning theory demonstrates that, under large pretraining, CoT prompting approximates Bayesian model averaging, and error decays exponentially in the number of CoT demonstrations. Transformer network approximation error can also decay exponentially with depth, up to a plateau set by finite pretraining (Hu et al., 2024).
Necessity of curriculum learning: Direct training of latent CoT models leads to distributional mismatch and permanent performance gaps. Curricula that progressively anchor latent step states to expert trajectories are theoretically necessary to close this gap and achieve optimal reasoning performance (Zou et al., 1 Feb 2026).
Diversity and robustness: Latent CoT models, especially those decoupled from surface generation (e.g., PLaT), yield broader solution manifolds, sustaining higher entropy and branching factor during sampling, and thus supporting richer search-based inference (e.g., Tree-of-Thought methods) (Wang et al., 29 Jan 2026, Li et al., 9 Feb 2026).

6. Multimodal and Skill-Structured Extensions

Latent CoT frameworks generalize across modalities and to structured reasoning skills:

Multimodal latent CoT: Reasoning across modalities (text, vision) is achieved by interleaving latent representations (e.g., visual sketches) and text. Latent blocks are generated, aligned, and decoded using specialized diffusion or attention modules for visual content, with corresponding joint losses over text and latent reconstruction (Shao et al., 31 Jan 2026, Pham et al., 18 Aug 2025).
Skill-discovery and policy-based selection: Approaches such as Latent Reasoning Skills (LaRS) introduce compact continuous "skill" latent variables. Policies over this skill space enable rapid, similarity-based matching of demonstrations, improving few-shot performance and retrieval efficiency (Xu et al., 2023).

7. Implications and Design Recommendations

A view of CoT as a latent variable yields actionable guidelines:

Retain only variables or tokens that carry intermediate values; excise or compress filler steps for efficiency.
Compress computations into as few latent steps as possible before sequence length or complexity limits are hit.
Be alert to causal chain "shortcuts" or trivial subproblem copying, as these undermine reasoning faithfulness.
Explicitly monitor and possibly regularize causal sensitivity, step necessity, and representational commitment across the latent trajectory to support robust, interpretable, and stable reasoning (Zhu et al., 8 May 2025, Li et al., 9 Feb 2026).

In conclusion, the latent variable perspective on chain-of-thought provides a precise mathematical and empirical framework for understanding, training, and controlling multi-step reasoning in large models. CoT tokens and representations function as mutable, causally potent states—subject to computational, statistical, and architectural trade-offs—and unify explicit, hybrid, and internalized reasoning within a common formalism (Zhu et al., 8 May 2025, Shao et al., 31 Jan 2026, Zhu et al., 4 Feb 2026, Wu et al., 10 Jul 2025, Zou et al., 1 Feb 2026).