Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

Published 16 Mar 2026 in cs.AI and cs.LG | (2603.15500v1)

Abstract: LLMs often exhibit Aha moments during reasoning, such as apparent self-correction following tokens like "Wait," yet their underlying mechanisms remain unclear. We introduce an information-theoretic framework that decomposes reasoning into procedural information and epistemic verbalization - the explicit externalization of uncertainty that supports downstream control actions. We show that purely procedural reasoning can become informationally stagnant, whereas epistemic verbalization enables continued information acquisition and is critical for achieving information sufficiency. Empirical results demonstrate that strong reasoning performance is driven by uncertainty externalization rather than specific surface tokens. Our framework unifies prior findings on Aha moments and post-training experiments, and offers insights for future reasoning model design.

Abstract PDF Upgrade to Chat

Summary

The paper introduces an information-theoretic model that formalizes reasoning in LLMs as strategic information allocation under uncertainty.
It distinguishes procedural steps from epistemic verbalization, showing how explicit uncertainty expressions enable self-correction.
Empirical results demonstrate that models with epistemic tokens maintain higher performance and effectively navigate reasoning collapse modes.

Information-Theoretic Analysis of Reasoning in LLMs: Epistemic Verbalization under Uncertainty

Theoretical Framework and Motivation

This paper introduces an information-theoretic model of reasoning in LLMs, formalizing reasoning as strategic information allocation under uncertainty (2603.15500). The central theoretical construct is the separation of procedural information—step-by-step computation, symbolic manipulation, and subtask execution—from epistemic verbalization, defined as the explicit externalization of uncertainty about the reasoning trajectory at the token level. Rather than treating specific tokens (e.g., "Wait") or Aha moments as core mechanisms, the paper emphasizes uncertainty externalization as the key intervention that enables self-correction and continued information acquisition.

The analysis is situated in the closed-world inference setting, where LLMs operate without external evidence, relying exclusively on internal belief transformation. The reasoning process is modeled as self-conditioning, where internally generated representations reshape the predictive distribution $P_\theta(Y \mid s_t)$ . The objective is to produce a reasoning trace $s_T$ minimizing entropy over the target variable, establishing information sufficiency as a necessary condition for task success.

Limits of Procedural Reasoning and Collapse Modes

Procedural reasoning, implemented via step-wise execution and task decomposition (Chain-of-Thought, CoT), is shown to suffer from fundamental limitations. When the execution trajectory diverges from the intended path—either due to unidentifiable subtasks or intermediate misjudgments—further procedural continuation fails to reduce uncertainty about the correct answer, resulting in informational stagnation.

Three collapse modes are identified:

Recursive step expansion: The model resorts to brute-force substitutions or repetitive steps.
Problem injection: The model shifts to solving a different problem without explicit recognition.
Degenerate loops: The model repeats words, tokens, or structures without progress.
Figure 1: Three common modes of reasoning collapse in procedural reasoning, illustrating recursive expansion, problem injection, and degenerate loops.

Once procedural divergence occurs, the conditional entropy of the target variable cannot converge to zero. This is formally shown via information-theoretic arguments, demonstrating that purely procedural continuation is insufficient for recovery or error correction.

Epistemic Verbalization: Uncertainty as Actionable Information

Token-level uncertainty (entropy) is often locally low even when reasoning is globally incorrect, failing to capture trajectory-level uncertainty. The notion of epistemic verbalization is introduced: internal assessments (e.g., "Is that step correct?") acquire causal efficacy only when externalized in the reasoning trace. Epistemic verbalization, thus, is not a superficial artifact but an informational signal that enables continued belief refinement and supports downstream control actions such as self-correction.

Analysis reveals that epistemic verbalization is critical for breaking procedural stagnation and achieving information sufficiency. A theoretical proposition demonstrates that sporadic epistemic updates guarantee continued reduction of conditional entropy, formalizing the benefit of uncertainty externalization.

Mutual Information Dynamics: Evaluative Expressions Drive Progress

Empirical investigation connects epistemic verbalization to mutual-information peaks ("MI peaks") in reasoning traces. While "thinking tokens" (e.g., "Wait", "Hmm") have been previously correlated with information surges, the paper finds that elevated mutual information is associated not with the tokens themselves, but with evaluative behaviors—explicit epistemic verbalizations.

Figure 2: Token-level analysis of mutual information (MI) shows high MI corresponds to evaluative, epistemic behaviors rather than specific tokens.

For instance, in the AIME24 #7 problem, only models employing epistemic verbalization sustain information gain and self-correct to the correct answer, whereas procedural-only models stagnate.

Figure 3: On an AIME24 problem, models using epistemic verbalization maintained information gain and self-corrected, while procedural-only models failed.

This pattern indicates that epistemic verbalization provides actionable structure, facilitating recovery and adaptation during reasoning.

Empirical Analysis: Uncertainty Expression and Model Capacity

The study investigates how uncertainty is verbalized across varying model capacities and task difficulty. Strong numerical results highlight that high-reasoning models facing difficult problems (AIME24/25) generate longer responses and more epistemic verbalizations, particularly in smaller models. As task difficulty increases, smaller models exhibit higher rates of epistemic token occurrence, while larger models achieve higher scores and generate more concise traces.

Figure 4: Acc@16 (average score) and Len@16 (average response length) for DeepSeek-Distill 1.5B–14B models on math benchmarks.

Figure 5: Token occurrence counts for DeepSeek-R1-Distill-Qwen-{1.5B, 7B, 14B} models, illustrating size-dependent epistemic token usage.

Test-Time and Training-Time Controls: Epistemic Tokens and Dataset Structure

Manipulating epistemic token generation at test time reveals substantial effects on performance. Suppressing epistemic tokens in high-reasoning models leads to a 25% drop in accuracy, but models compensate via alternative expressions of uncertainty, demonstrating the underlying diversity of epistemic verbalization. Conversely, inducing epistemic tokens in procedural-only models shows only marginal improvement unless integrated throughout the reasoning trajectory via few-shot prompting.

Figure 6: Comparison of epistemic token prevention and induction, demonstrating performance drops and limited gains from token-level interventions.

Controlled distillation experiments using variants of the LIMO dataset, with and without epistemic verbalization, further illuminate the informational role of uncertainty. Models fine-tuned on traces devoid of epistemic verbalization suffer highly degraded reasoning performance, despite procedural correctness of solutions—strongly contradicting the notion that correct procedural traces are sufficient. Performance drop is more severe than token suppression, establishing epistemic verbalization as critical for reasoning and control.

Figure 7: Per-sample counts of epistemic tokens in the LIMO dataset; responses are rich in epistemic verbalizations, especially "Wait".

Figure 8: Comparison of AIME24 pass@1 scores between base models and SFT models trained on the LIMO-v2 dataset.

Distributional Alignment and Distillation Outcomes

Evaluating alignment between curated datasets and base model characteristics, the paper demonstrates that effective distillation requires the base model's pre-existing epistemic properties to support the dataset's epistemic signals. Token-level log probability and entropy analyses reveal that successful distillation occurs only when epistemic tokens are within the support of the student model's distribution; otherwise, distillation outcomes are inconsistent or negative.

Figure 9: Cumulative distributions of log-probabilities and entropy for all tokens versus epistemic tokens, illustrating alignment and support constraints.

Practical and Theoretical Implications

This information-theoretic framework provides a unified perspective on reasoning dynamics, resolving ambiguities around Aha moments, reflection, and surface-level token manipulations. Distinguishing between procedural and epistemic axes clarifies the conditions under which self-correction can arise, and explains mixed empirical results from prior studies.

From a practical standpoint, post-training, distillation, and RL-based optimization should account for both procedural task span and epistemic verbalization capacity. Effective dataset curation, tag-based annotation, and capability-aware scheduling are recommended to align model development with target reasoning competencies. Compressing chain-of-thought traces must preserve epistemic signals; indiscriminate reduction may eliminate useful uncertainty expressions, particularly in models with limited procedural span.

Theoretically, the framework extends to world-Bayesian settings, generalizing to tool-augmented or agentic LLMs, where epistemic verbalization operates alongside environment-facing actions and external evidence acquisition. The decomposition of information axes supports new evaluation metrics and training strategies for uncertainty-aware reasoning models.

Conclusion

The paper reframes reasoning in LLMs as strategic information allocation under uncertainty, with epistemic verbalization as a central informational axis enabling continued information acquisition and control. Empirical and theoretical analyses demonstrate that explicit uncertainty externalization is essential for robust reasoning, self-correction, and distillation outcomes. The framework integrates prior fragmented observations and provides actionable guidance for model training, evaluation, and future research directions in AI reasoning systems.

Markdown Report Issue