Implicit Chain-of-Thought

Updated 4 July 2026

ICoT is a framework where multi-step reasoning is internalized into latent states, hidden tokens, or compact representations rather than fully articulated rationales.
Methodologies like hidden-state distillation, low-rank alignment, and tokenized reasoning demonstrate that compressed reasoning can match explicit approaches while reducing latency and token cost.
Empirical studies and theoretical analyses reveal that maintaining alignment with explicit rationales is crucial for ensuring accuracy in complex tasks such as mathematical problem solving, code generation, and multimodal applications.

Implicit Chain-of-Thought (ICoT) denotes a family of approaches in which multi-step reasoning is retained during training or internal computation but is not fully emitted as a long natural-language rationale at inference time. Across papers, the term is used in several related senses: latent hidden-state reasoning distilled from explicit CoT, discrete functional or semantically aligned reasoning tokens, structured intermediate representations used as supervision or prompting scaffolds, and, in some analyses, a mechanism by which successful answers are driven primarily by internal pattern matching rather than the visible rationale (Deng et al., 2023, Zheng et al., 7 Apr 2025, Solgi et al., 3 Jun 2026).

1. Definitions and scope

Recent work does not treat ICoT as a single standardized method. Instead, it is an umbrella for several distinct but convergent ideas: internalizing explicit reasoning, compressing it, aligning latent reasoning with explicit traces, or replacing long rationales with compact structured surrogates.

Usage of ICoT	Representative work	Reasoning carrier
Mechanism in pattern-based ICL	(Zheng et al., 7 Apr 2025)	Latent pattern matching
Hidden-state internalization	(Deng et al., 2023, Solgi et al., 3 Jun 2026)	Hidden states or latent trajectories
Tokenized latent reasoning	(Wei et al., 24 Sep 2025, Lee et al., 27 May 2026, He et al., 28 Oct 2025)	Implicit tokens or functional tokens
Prompted structured reasoning	(Li et al., 16 Dec 2025, Zeng et al., 8 May 2025)	Specification/Idea or JSON state
Multimodal latent planning	(Yang et al., 20 Feb 2026, Li et al., 23 Jun 2026, Yuen et al., 2024)	`>` traces, visual queries, internalized ASR

In "The Curse of CoT," ICoT is not a named method but a mechanism: in pattern-based in-context learning, models often answer correctly through latent pattern matching even when the written rationale is weak or wrong (Zheng et al., 7 Apr 2025). In "Implicit Chain of Thought Reasoning via Knowledge Distillation," ICoT is a hidden-state formulation in which training uses CoT supervision but inference uses only latent reasoning and direct answer generation (Deng et al., 2023). In "Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation," ICoT is a structured prompting scheme centered on “Specification” and “Idea,” used selectively on complex code tasks (Li et al., 16 Dec 2025). In "BLM-Guard," it is a rule-driven multimodal supervision and reward design whose <think> segment is primarily a training signal rather than a mandatory deployment artifact (Yang et al., 20 Feb 2026).

A common denominator is that explicit rationales are no longer treated as the only or even the primary substrate of reasoning. What changes from paper to paper is the substrate: hidden trajectories, latent tokens, structured prompts, policy-aligned reasoning traces, or multimodal latent plans.

2. Mechanistic interpretations of implicit reasoning

A major line of work treats ICoT as a property of how models already solve tasks. In pattern-based in-context learning, "The Curse of CoT" formalizes direct answering as a one-stage mapping from demonstrations and query to the answer, and CoT as a two-stage process that first generates a rationale and then conditions the answer on that rationale. Its central result is an explicit–implicit duality: in successful CoT cases, implicit reasoning often compensates for incorrect explicit reasoning, and this happens far more often than the reverse—about $7.5\times$ on List Function and $3.6\times$ on MiniSCAN (Zheng et al., 7 Apr 2025). The same paper argues that CoT can hurt pattern-based ICL because explicit pattern inference is weak or noisy and because inserted rationales increase contextual distance between demonstrations and the final answer.

A second mechanistic account comes from "Chain-of-Thought Tokens are Computer Program Variables." On multi-digit multiplication and dynamic programming, the paper finds that preserving only tokens that store intermediate results achieves comparable performance, that storing those intermediate results in an alternative latent form does not affect model performance, and that random interventions on those values change later CoT tokens and the final answer correspondingly (Zhu et al., 8 May 2025). This frames CoT tokens as variable-like state carriers rather than as purely expository text, and it supports the view that ICoT can replace visible variables with latent ones.

A third account, "How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation," argues that CoT may act as a decoding space pruner. The paper reports that higher template adherence strongly correlates with improved performance, that CoT raises the probability concentration around answer-template tokens, and that it modulates neuron engagement in a task-dependent way: reducing neuron activation in open-domain tasks yet increasing it in closed-domain scenarios (Yang et al., 28 Jul 2025). This suggests that part of CoT’s utility lies in constraining the output trajectory rather than merely verbalizing a solution.

Taken together, these analyses imply that ICoT is not merely “hidden CoT text.” It is a broader claim that effective reasoning may be implemented as internal state transitions, latent variable updates, or constrained decoding regimes, with the visible rationale serving only as one externalization of that process.

3. Hidden-state internalization and theoretical foundations

The earliest explicit hidden-state formulation in this set is "Implicit Chain of Thought Reasoning via Knowledge Distillation." There, an explicit-CoT teacher provides hidden states over CoT tokens, and a student learns to reason “vertically” across layers rather than “horizontally” through generated rationale tokens. On multi-digit multiplication and grade-school math, this enables tasks previously not solvable without explicit CoT while running at a speed comparable to no chain-of-thought (Deng et al., 2023).

"LoRi: Low-Rank Distillation for Implicit Reasoning" develops this hidden-state view further by modeling teacher reasoning as a trajectory $H_T \in \mathbb{R}^{N \times L \times H}$ and student reasoning as a shorter latent trajectory $H_S \in \mathbb{R}^{K \times L \times H}$ . It projects both into a shared low-rank subspace and aligns first- and second-order statistics rather than individual steps. The paper argues that hidden-state reasoning trajectories exhibit low-rank structure and reports that the resulting student is $5$– $7\times$ faster than explicit CoT while approaching explicit CoT accuracy on mathematical reasoning benchmarks (Solgi et al., 3 Jun 2026).

A theoretical explanation for when such compression should or should not work is provided by "Chain Of Thought Compression: A Theoritical Analysis." It introduces Order- $r$ Interaction and shows that the useful gradient for an Order- $r$ interaction decays as $O(m^{-r})$ , implying sample complexity $n = \Omega(m^{2(r-1)} \log m)$ for irreducible high-order reasoning. In that framework, naive ICoT becomes hard when skipped steps force the model to learn high-order dependencies directly. The proposed ALiCoT aligns latent token distributions with intermediate reasoning states, and on NatBool-DAG it achieves a $3.6\times$ 0 speedup while maintaining performance comparable to explicit CoT (Li et al., 29 Jan 2026).

A complementary theory appears in "Transformers Provably Learn to Internalize Chain-of-Thought." That paper proves that an $3.6\times$ 1-layer transformer trained under Log-ICoT learns $3.6\times$ 2-parity with $3.6\times$ 3 samples and $3.6\times$ 4 training stages, matching explicit CoT sample efficiency while eliminating inference overhead (Huang et al., 27 May 2026). Its main conceptual contribution is that intermediate reasoning can be removed in geometric chunks aligned with the reasoning graph, rather than one token at a time.

These works collectively establish a rigorous core for ICoT: explicit CoT can be viewed as supervision for an internal computation that later becomes latent, compressed, and layer-distributed.

4. Tokenized, calibrated, and function-level latent reasoning

A second family of methods keeps reasoning latent but makes it more structured than raw hidden-state distillation. "SIM-CoT: Supervised Implicit Chain-of-Thought" diagnoses a latent instability problem: increasing the number of implicit reasoning tokens can cause training collapse because latent states become homogeneous and lose semantic diversity. It addresses this with an auxiliary decoder that aligns each implicit token with its corresponding explicit reasoning step. The auxiliary decoder is removed at inference. The paper reports a $3.6\times$ 5 improvement over Coconut on GPT-2 and $3.6\times$ 6 over CODI on LLaMA-3.1 8B, and it surpasses the explicit CoT baseline on GPT-2 by $3.6\times$ 7 with $3.6\times$ 8 greater token efficiency (Wei et al., 24 Sep 2025).

"Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought" treats latent thoughts as continuous tokens generated by an assistant model and then refined by an energy-based model before the base model produces explicit reasoning and the final answer. The energy model is trained so that low-energy latent trajectories correspond to more coherent reasoning. The reported effect is a strong gain in single-chain consistency, often matching multi-chain self-consistency with far fewer sampled chains (Chen et al., 10 Nov 2025).

"SemCoT" addresses two issues that earlier latent-token methods often neglect: semantic alignment with explicit rationales and the time cost of generating each implicit token. It uses a contrastively trained sentence transformer to preserve semantic alignment between implicit and explicit reasoning and a lightweight LLM to generate implicit tokens more cheaply than the base model. The paper reports that one implicit token is often sufficient, that more tokens frequently hurt accuracy while increasing latency, and that SemCoT yields superior efficiency and effectiveness relative to state-of-the-art baselines on Llama-2-7B-chat and Mistral-7B-Instruct (He et al., 28 Oct 2025).

"CIRF" converts explicit CoT into a dynamic sequence of discrete functional tokens. Each semantically coherent reasoning step is mapped to a reusable functional unit, and the model is fine-tuned to autoregressively generate functional tokens, optional result snippets, and then the final answer. The paper reports extreme compression of visible reasoning length: on GSM8K, an average of $3.6\times$ 9 CoT tokens becomes $H_T \in \mathbb{R}^{N \times L \times H}$ 0 functional tokens, and on CommonsenseQA, $H_T \in \mathbb{R}^{N \times L \times H}$ 1 CoT tokens becomes $H_T \in \mathbb{R}^{N \times L \times H}$ 2 functional tokens (Lee et al., 27 May 2026). CIRF’s significance is that it restores a coarse form of interpretability while staying within the efficiency regime of ICoT.

Across these methods, the central design question is no longer whether reasoning should be explicit or implicit, but how latent reasoning should be constrained: by step supervision, energy calibration, semantic similarity, or reusable functional units.

5. Prompting-based and task-adaptive ICoT

Not all ICoT work relies on hidden-state distillation. In code generation, "Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation" defines ICoT as a compact two-part intermediate representation: Specification and Idea, where the latter includes the core algorithmic logic and time complexity. This ICoT is invoked only for tasks classified as Complex; Simple tasks use few-shot direct generation. The resulting RoutingGen framework achieves state-of-the-art performance in most settings while reducing total token usage by $H_T \in \mathbb{R}^{N \times L \times H}$ 3 on average across settings, and ICoT outperforms six existing prompting baselines on challenging benchmarks (Li et al., 16 Dec 2025).

This prompting-based usage broadens the scope of ICoT. Here, “implicit” does not mean hidden states alone; it means that the reasoning representation is deliberately compressed and structured so that the model captures task intention without producing a long free-form rationale. The representation is explicit enough to guide generation, but compact enough to avoid uniform overthinking.

A plausible implication is that ICoT should be understood as a design principle—selective, structured, and compressed reasoning—rather than as a single implementation recipe. In this view, dynamic routing and rationale compression are as central as hidden-state internalization.

6. Multimodal and application-specific ICoT

Several papers extend ICoT beyond text-only reasoning. In "BLM-Guard," ICoT is a rule-driven multimodal reasoning supervision scheme for short-video ad moderation. A frozen InternVL-3-78B generates structured reasoning chains from selected keyframes, ASR transcripts, and policy rules; a Qwen2.5-VL-7B model is then trained with rule-anchored SFT and critic-guided RL. In the main ablation, Rule-SFT + Rule-RL + SCA-R (Full) reaches s-Acc: 0.914 and Con.: 0.845, showing that policy-aligned implicit reasoning can be both accurate and auditable (Yang et al., 20 Feb 2026).

"IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation" implements ICoT entirely in latent visual space. The method splits visual conditioning into structural queries $H_T \in \mathbb{R}^{N \times L \times H}$ 4 and semantic queries $H_T \in \mathbb{R}^{N \times L \times H}$ 5, with causal ordering enforcing a structural-to-semantic cascade. Training-only sketch supervision makes $H_T \in \mathbb{R}^{N \times L \times H}$ 6 encode object counts, coarse layouts, and spatial relations without requiring sketch extraction at inference. On GenEval, IV-CoT achieves 0.88 overall, and on T2I-CompBench it achieves 0.5743, while latency is 1.693 s compared with 15.261 s for T2I-R1, 16.744 s for GoT-R1, and 25.362 s for TWIG-RL (Li et al., 23 Jun 2026).

"Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM" applies the ICoT curriculum to speech. A speech LLM is first trained on an explicit Audio–Text–Text–Audio chain, then the ASR transcript tokens are gradually removed so that the model goes directly from input audio tokens to text response and output audio tokens. The latency to first output audio falls from 1.09 s for A–T–T–A finetuning to 0.87 s for A–T–A ASR ICoT, while maintaining substantially better response quality than a direct A–T–A baseline without the ICoT curriculum (Yuen et al., 2024).

A text-only application appears in "LLM-driven Security Assistant for Internet of Things via Chain-of-Thought." There, ICoT is a two-stage prompting framework: first produce a structured JSON state $H_T \in \mathbb{R}^{N \times L \times H}$ 7 summarizing vulnerability and user characteristics, then condition the response on $H_T \in \mathbb{R}^{N \times L \times H}$ 8, $H_T \in \mathbb{R}^{N \times L \times H}$ 9, and the user query. The paper reports consistent gains over “LLM only” prompting across Reliability, Relevance, Detail, Technicality, and Friendliness, especially because responses are tailored to user identity and expertise level (Zeng et al., 8 May 2025).

These application papers show that ICoT is not confined to latent text reasoning. It can be implemented as policy-aligned supervision, latent visual planning, internalized speech transcription, or structured intermediate states in safety-critical domains.

7. Empirical profile, limits, and open questions

The literature presents ICoT as both an opportunity and a boundary condition. On the positive side, hidden-state distillation, low-rank trajectory matching, step-level supervision, energy calibration, semantically aligned implicit tokens, and discrete functional units all show that compact reasoning can approach or sometimes match explicit CoT while reducing latency or token cost (Solgi et al., 3 Jun 2026, Wei et al., 24 Sep 2025, Lee et al., 27 May 2026).

At the same time, several papers emphasize that compression is not universally benign. "The Curse of CoT" shows that for pattern-based ICL, explicit CoT is systematically harmful and direct answering is usually the best-performing strategy; the same paper explicitly notes that CoT is “particularly effective” for math, symbolic proofs, code, and abstract probabilistic or logical reasoning, but not for pattern-based input–output rule induction (Zheng et al., 7 Apr 2025). "Chain Of Thought Compression" formalizes a related boundary: compression complexity scales with the number of compressed steps and with logical density, so irreducible reasoning tasks face an interaction barrier rather than a free compression gain (Li et al., 29 Jan 2026). "SIM-CoT" adds that simply increasing implicit reasoning capacity can destabilize training unless the latent space is grounded by step-level supervision (Wei et al., 24 Sep 2025).

A recurring theme is therefore alignment. ICoT works best when the latent process remains tethered to something explicit: teacher hidden states, intermediate reasoning states, step labels, semantic similarity, reusable functional units, policy rules, or sketch supervision. This suggests that the field is converging on a hybrid view. Explicit CoT remains valuable as supervision, diagnosis, and interpretability scaffolding; ICoT becomes the deployment form when latency, token efficiency, or multimodal bandwidth make full rationales undesirable.

In that sense, Implicit Chain-of-Thought is best understood not as the disappearance of reasoning, but as its relocation—from visible natural-language traces into latent trajectories, compact tokens, structured states, or ordered multimodal query cascades.