Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 80 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 33 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 117 tok/s Pro

Kimi K2 176 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

Coconut: Continuous Chain-of-Thought for LLMs

Updated 1 July 2025

Coconut is a continuous reasoning framework that encodes intermediate computation as high-dimensional latent vectors rather than explicit language tokens.
It decouples language from reasoning by using latent chaining and superposition, which supports parallel search and efficient inference across complex tasks.
Empirical studies show Coconut and its variants achieve significant gains in speed, token efficiency, and accuracy on reasoning-intensive benchmarks.

Coconut (Chain of Continuous Thought) refers to a family of reasoning frameworks for LLMs and related neural architectures that enable intermediate “thought steps” to be represented and manipulated as continuous latent vectors, rather than as explicit natural language tokens. This paradigm relaxes the strict coupling between reasoning and language, allowing models to perform more information-rich, efficient, and parallelizable inference. The Coconut approach and its extensions are supported by both theoretical analyses and empirical evidence across a range of reasoning-intensive applications.

1. Theoretical Foundations and Core Principles

The haLLMark of Coconut is the continuous chain-of-thought (CoT), where the reasoning state at each step is encoded as a high-dimensional vector (typically the model's last hidden state or a projection thereof). Rather than generating or consuming intermediate reasoning in the form of vocabulary tokens, the model advances its “thought process” by directly feeding these latent states back as inputs, resulting in what is termed “continuous thought.”

The core principles underpinning this approach include:

Decoupling language and reasoning: Whereas conventional CoT [Wei et al., 2022] emulates cognitive reasoning by emitting text-based rationales, Coconut reasons internally in an unrestricted latent space, circumventing the linguistic bottleneck.
Expressivity and superposition: The continuous latent space permits the representation of multiple alternative hypotheses or reasoning branches in superposition, enabling parallel exploration and mitigating the risk of early commitment to suboptimal paths (Zhu et al., 18 May 2025, Gozeten et al., 29 May 2025).
Efficient computation: By propagating vectors rather than sequences of tokens, Coconut-based methods reduce token consumption, can achieve faster inference, and, under certain formalizations, perform complex search or combinatorial reasoning in a single forward pass.

A typical Coconut reasoning iteration updates the latent state as: $h_{t+1} = f([e(x_1); \ldots; e(x_{t+1})])$ where $h_{t+1}$ is the current hidden state, $e(\cdot)$ is the embedding function, and $f$ represents the (possibly multi-layer) transformer or neural network.

2. Practical Methodologies and Architectural Variants

Coconut-style Continuous Reasoning

Latent chaining: In the basic Coconut method (Hao et al., 9 Dec 2024), reasoning alternates between standard language mode (token-level) and latent mode (last hidden states as next-step input). Markers such as <bot> and <eot> delimit these spans.
Curriculum and distillation approaches: Training can proceed via curriculum learning, gradually increasing the fraction of reasoning performed in latent space (Coconut), or via self-distillation, wherein explicit CoT is “compressed” into continuous space by matching teacher (explicit) and student (implicit/latent) hidden vectors (CODI (Shen et al., 28 Feb 2025)).
Parallelization: PCCoT (Wu et al., 23 Jun 2025) enables parallel update of all latent thought tokens via Jacobi iteration, drastically improving training and inference efficiency relative to the standard sequential Coconut loop.

Example: Latent Chain Update (Jacobi iteration, PCCoT)

For $c$ latent tokens and $T$ Jacobi iterations: $[h_{n+1}^{(t+1)}, ..., h_{n+c+1}^{(t+1)}] = f([E_{x_1}; ...; E_{x_{n+1}}; h_{n+1}^{(t)}; ...; h_{n+c}^{(t)}])$ All $c$ latent tokens are updated in parallel at each iteration $t$ .

Continuous Chain-of-Thought with Soft/Parallel Sampling

Continuous CoT2 (Gozeten et al., 29 May 2025) further generalizes reasoning to operate over distributions (superpositions) of token embeddings, allowing the model to represent all possible next steps probabilistically: $z_t = E^\top \alpha_t$ where $\alpha_t$ is the softmax over vocabulary at step $t$ , and $E$ the embedding matrix.

Sampling and exploration strategies—including multi-token and Dirichlet sampling—enable controlled exploration in continuous space, enhancing reasoning performance and robustness.

Test-Time Scaling in Latent Space

SoftCoT++ (Xu et al., 16 May 2025) enables test-time scaling by generating multiple diverse “latents” through specialized initial tokens and contrastive losses, addressing the determinism constraint (single fixed latent per input) inherent to most continuous CoT implementations.

3. Theoretical Results and Efficiency Gains

Several works establish the formal superiority of Coconut-style reasoning in certain regimes:

Parallel search and superposition: Continuous thought vectors can encode the full search frontier during reasoning (e.g., all reachable nodes in a graph at each BFS step) (Zhu et al., 18 May 2025), rather than tracking a single sampled path as in discrete CoT. This yields exponential reductions in the number of required inference steps for combinatorial tasks such as subset sum or graph reachability.
Jacobi-style parallelization: PCCoT (Wu et al., 23 Jun 2025) proves that with sufficient Jacobi iterations, parallel latent token updates match the sequential dependency structure, but with substantial hardware and wall-time accelerations (nearly 50% time savings relative to sequential continuous CoT).
Compression and scalability: The CODI framework (Shen et al., 28 Feb 2025) matches explicit CoT performance on GSM8k at a 3.1x reduction in step count and inference time, also demonstrating superior generalization on adversarial/math tasks.

4. Empirical Performance and Benchmarks

Substantial evidence demonstrates the effectiveness of continuous chain-of-thought reasoning:

Method	GSM8k Acc.	ProsQA Acc.	Reasoning Token Eff.	Robustness (OOD)
CoT (explicit)	44.1%	77.5%	High token count	Varies
Coconut	34.1%	97.0%	Fewer tokens	Moderate
CODI	43.7%	–	3.1x compression	Strong
PCCoT	49.5%	–	50% time saving	Strong, low variance
SoftCoT++	–	–	Scalable/diverse	State of the art

On planning-intensive tasks like ProsQA (multiple reasoning hops in a DAG), Coconut and its extensions consistently outperform discrete and language-based baselines, especially in cases requiring backtracking and parallel search.
Parallel continuous reasoning (PCCoT) provides best-in-class accuracy and reproducibility across seeds on GSM8k-Aug and GSM8k-Aug-NL, with up to 2x speedup in training/inference.
CODI is the first implicit CoT method to match explicit CoT’s performance on GSM8k and generalizes well out-of-domain.

5. Applications and Broader Impact

Coconut and its variants address a spectrum of advanced reasoning tasks:

Mathematical and logical reasoning: Parallel, compositional, and backtracking-intensive problems (arithmetic word problems, subset sum, graph reachability).
Long-context and process-supervised tasks: Enabling efficient reasoning in ultra-long contexts by combining latent thought compression with process supervision (LongRePS (Zhu et al., 28 Feb 2025)).
Dialogue and multimodal systems: Structurally aligning spoken dialogue systems with CoT motifs to boost efficiency and naturalness (Arora et al., 31 May 2025).
Meta-reasoning and agent planning: Allowing models to branch, aggregate, and explore in high-dimensional latent space, thus opening the door for more robust and agentic planning systems.

6. Interpretability, Robustness, and Open Challenges

Interpretability: Methods such as CODI (Shen et al., 28 Feb 2025) demonstrate that continuous thoughts can often be decoded back to meaningful intermediate results, albeit with a trade-off—continuous representations are less human-legible than natural language rationalizations.
Stability: PCCoT (Wu et al., 23 Jun 2025) provides empirical evidence for improved stability and convergence over both discrete and original continuous CoT, especially with respect to initialization and number of parallel steps.
Generalization: Empirical findings indicate robust generalization across out-of-domain tasks and increased data efficiency.
Open challenges: Training and supervising latent reasoning, ensuring traceability and error localization, and scaling to more diverse modalities and agent architectures remain as active areas of investigation. Future work aims to further bridge the gap between efficiency and interpretability, and to develop architectures that maximize the potential of continuous latent reasoning.

Summary Table: Core Coconut and Variants

Aspect	Coconut (Baseline)	PCCoT	CODI	SoftCoT++
Reasoning style	Sequential latent	Parallel latent	Distilled latent	Scalable latent
Efficiency	Fewer tokens	~2x faster	3x compression	Test-time scaling
Planning capability	Parallel/BFS	Parallel/BFS	Distilled	Diverse, robust
Interpretability	Lower	Lower	Decodable	N/A
Robustness	Moderate	Strong	Strong	Strong
Theoretical coverage	BFS, superposition	Full via Jacobi	CoT shift	KL-contrastive

References and Further Resources

(Hao et al., 9 Dec 2024) Training LLMs to Reason in a Continuous Latent Space
(Shen et al., 28 Feb 2025) CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
(Wu et al., 23 Jun 2025) Parallel Continuous Chain-of-Thought with Jacobi Iteration
(Gozeten et al., 29 May 2025) Continuous Chain of Thought Enables Parallel Exploration and Reasoning
(Zhu et al., 18 May 2025) Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
(Xu et al., 16 May 2025) SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
(Zhu et al., 28 Feb 2025) Chain-of-Thought Matters: Improving Long-Context LLMs with Reasoning Path Supervision

Coconut (Chain of Continuous Thought) has thus established itself as a foundational paradigm for efficient, scalable, and parallel neural reasoning, unifying advances in representation, optimization, and empirical capability across a wide array of cognitive tasks in artificial intelligence.