This paper, "Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning" (Chen et al., 22 May 2025 ), provides an extensive overview of the emerging field of latent Chain-of-Thought (CoT) reasoning in LLMs. It addresses the limitations of conventional CoT, which relies on explicitly verbalized natural language reasoning steps, leading to inefficiencies and constraints in abstract reasoning. Latent CoT proposes that LLMs can perform reasoning internally within latent spaces, decoupling the reasoning process from language. This approach promises richer cognitive representations, more flexible reasoning, and faster inference.
The authors introduce a unified taxonomy to structure the research in latent CoT, categorizing it into four main perspectives:
- Token-wise Strategies: How special tokens (discrete or continuous) are used to guide or represent latent reasoning.
- Internal Mechanisms: How model architectures and representational schemes support implicit reasoning.
- Analysis: Efforts to understand and interpret the internal reasoning processes.
- Applications: Real-world uses of latent CoT.
Token-wise Strategies are divided into:
- Discrete Tokens: These are symbolic markers used to structure or trigger latent reasoning.
- Implementation: Involves introducing special tokens like "[pause]" tokens (Goyal et al., 2023), "planning tokens" (Wang et al., 2023), "thinking tokens" (Herel et al., 2024), or "filler tokens" (Pfau et al., 2024). These tokens can be predefined or learnable.
- Examples:
- Quiet-STaR [zelikman2024quietstar] uses learnable tokens to mark boundaries of internal rationales, enabling models to infer unstated steps.
- BoLT [ruan2025reasoning] models the thought process as a trainable latent variable.
- Reasoning CPT [ishibashi2025mininghiddenthoughtstexts] uses continual pretraining with synthetic data containing hidden thoughts.
- Compression-based methods like using VQ-VAEs (Su et al., 2025) condense reasoning steps into discrete latent tokens.
- PHD-Transformer [wu2025EPLS] uses hidden decoding tokens for efficient length scaling without increasing KV cache.
- Practical Insight: The structural arrangement of these tokens often matters more than their specific semantic content.
- Continuous Tokens: These are learned embeddings in latent spaces that facilitate implicit reasoning, representing intermediate states as trajectories in high-dimensional spaces.
- Post-training Methods: Adapt existing LLMs with minimal additional data.
- Intrinsic Methods: The LLM generates and consumes continuous tokens internally.
- COCONUT [hao2024traininglargelanguagemodels]: Feeds the model's last hidden states as its next input embedding for latent iteration.
- CODI [shen2025codi]: Uses self-distillation to align student model's hidden activations with a teacher's explicit CoT hidden states.
- LightThinker [zhang2025lightthinkerthinkingstepbystepcompression]: Model learns to compress reasoning into latent "gist" tokens.
- Auxiliary Methods: An external module generates continuous tokens injected into the main LLM.
- HCoT [liu2024expeditingelevatinglargelanguage]: An auxiliary CoT model generates and compresses thoughts into a special token.
- CCoT [cheng2024compressed]: Encodes reasoning sequences into variable-length latent embeddings ("contemplation tokens").
- SoftCoT [xu2025softcot]: A frozen assistant model and a trained projection layer generate "soft tokens" for a frozen LLM. SoftCoT++ [xu2025softcottesttimescalingsoft] extends this for test-time scaling and diversity.
- Limitation: Performance often matches, but doesn't significantly exceed, explicit CoT.
- Pre-training Methods: Embed latent reasoning during pre-training.
- CoCoMix [tack2025cocomix]: Mixes continuous "concepts" (extracted via a sparse autoencoder from activations) into hidden states during pre-training, creating a latent scaffold.
Internal Mechanisms explore how reasoning can emerge implicitly:
- Structural CoT: Focuses on architectural designs that support latent reasoning, such as depth, recurrence, and looping computations.
- Implementation: Deeper, narrower models often outperform wider ones for reasoning. Recurrent architectures or looped transformers allow iterative refinement of latent representations.
- Examples:
- CoTFormer [mohtashami2024cotformerchainofthoughtdrivenarchitecture]: Interleaves and loops representations to emulate CoT.
- Huginn [scaling_up_test_time]: A recurrent framework for dynamic resource allocation at test time.
- RELAY [RELAY]: Aligns CoT steps with loop iterations in a Looped Transformer, using intermediate supervision.
- Inner Thinking Transformer (ITT) [inner_thinking_transformer]: Treats each Transformer layer as a reasoning step with adaptive token routing.
- Key Idea: Increased effective depth through stacking or shared-weight mechanisms (recurrence) supports latent reasoning.
- Representational CoT: Focuses on embedding reasoning processes directly into the model's hidden states.
- Implementation: Achieved through specialized fine-tuning or distillation.
- Examples:
- STaR [zelikman2022starbootstrappingreasoningreasoning]: Rationale-augmented fine-tuning.
- ICoT [deng2023distillcot]: Knowledge distillation where student models emulate teacher's hidden-state trajectories from explicit CoT.
- Stepwise Internalization [deng2024explicit]: Phased fine-tuning to internalize CoT.
- System 2 Distillation [weston2024system2]: Self-distillation for implicit reasoning pathways.
Analysis and Interpretability explores whether LLMs perform genuine step-by-step reasoning internally:
- Internal Computation Interpretation: Studies suggest LLMs can perform implicit multi-step reasoning. Evidence includes recovering reasoning trees from attention patterns [hou-etal-2023-towards], emergent recurrent computation in transformers [brinkmann2024mechanisticanalysistransformertrained], and hidden states encoding multiple reasoning paths [shalev2024distributionalreasoningLLMsparallel].
- Shortcut Mechanisms: Research indicates that correct outputs might stem from shallow heuristics or pattern completion rather than deep reasoning. Examples include answers being decodable from early layers [din2024jumpconclusionsshortcuttingtransformers] or models learning to skip steps [liu2024languagemodelslearnskip].
- Latent Reasoning Dynamics: This area aims to characterize latent reasoning. Discoveries include a "latent CoT vector" that can elicit CoT behavior without explicit prompts [zhang2025uncoveringlatentchainthought] and methods like Chain-of-Embedding (CoE) to analyze hidden-state trajectories [wang2025latentspacechainofembeddingenables].
Applications of latent CoT reasoning include:
- Textual Reasoning: Applied to mathematical, commonsense, and logical multi-hop reasoning tasks.
- Multimodal Reasoning and Generation:
- Heima [shen2025efficientreasoninghiddenthinking]: Uses latent "thinking tokens" for multimodal tasks.
- XS-CoT [xue2025enhancing]: Hides cross-lingual speech reasoning in a semi-implicit token schedule.
- LatentLM [sun2024multimodall]: Treats modalities as latent tokens for a unified interface.
- Retrieval-Augmented Generation (RAG) and Recommendation:
- DEBATER [ji2025learningeffectiverepresentationsdense]: Incorporates Chain-of-Deliberation (CoD) using prompt tokens for latent reasoning in dense retrieval.
- ReaRec [tang2025thinkrecommendunleashinglatent]: Uses latent reasoning by recursively feeding hidden states for user interest modeling in recommendation.
Challenges and Future Directions:
- Challenges:
- Training Difficulties: Current methods optimize for explicit outputs, potentially not fully activating internal reasoning. Latent CoT still often underperforms explicit CoT in accuracy.
- Generalization Issues: Models may learn to compress specific templates rather than developing flexible abstract reasoning, leading to poor generalization.
- Interpretability Concerns: The "black box" nature of latent reasoning makes error identification and understanding conclusions difficult.
- Future Directions:
- Alternative Architectures: Exploring recurrent/looped Transformers or diffusion models [ye2024diffusionthoughtschainofthoughtreasoning, huang2025reinforcingdiffusionchainlateral].
- Interpretability and Verification: Developing methods to probe, decode, or verify latent representations.
- Training Approaches: Using reinforcement learning or curriculum learning.
- LLM Agents: Applying latent CoT for more compact and faster planning and decision-making.
- Social Intelligence and Theory of Mind: Using latent reasoning to model nested mental states.
The paper concludes that latent CoT reasoning is a promising direction for enabling more abstract, efficient, and scalable inference in LLMs. It aims to consolidate the fragmented research landscape and provide a foundation for future advancements. The authors also acknowledge the limitations of their survey, such as potential omissions due to the field's rapid evolution and the need for more rigorous empirical validation of surveyed works. Ethical considerations regarding interpretability, fairness, and safety in latent reasoning are emphasized as crucial areas for future work.