Entropy-informed Decoding: Adaptive Information-Driven Branching

Published 10 May 2026 in cs.LG, cs.AI, and cs.IT | (2605.09745v1)

Abstract: LLMs achieve remarkable generative performance, yet their output quality is dependent on the decoding strategy. While sampling-based methods (e.g., top-k, nucleus) and search-and-select based methods (e.g., beam search, best-of-n, majority voting) can improve upon greedy decoding, both approaches suffer from limitations: sampling generally commits to a single path, while search often expends excessive computation regardless of task complexity. To address these, we introduce Entropy-informed decoding (EDEN), a plug-and-play, model-agnostic decoding framework that adaptively allocates computation based on the model's own uncertainty, approximating higher-width beam search with fewer expansions. At each generation step, EDEN estimates the entropy of the output token distribution and adjusts the branching factor monotonically with the entropy, expanding more candidates in high-entropy regions and following a greedier path in low-entropy regions, improving token efficiency. Experiments across complex tasks, including mathematical reasoning, code generation, and scientific questions, demonstrate that EDEN consistently improves output quality over existing decoding strategies, achieving better accuracy-expansion trade-offs than fixed-width beam search. By treating next-token selection as a noisy maximisation problem, we prove that branching factors monotone in entropy are guaranteed to find better (i.e. more probable) continuations than any fixed branching factor within the same total expansion budget, and derive explicit regret rates characterising the benefit of the adaptive allocation.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces EDEN, an entropy-guided decoding framework that adjusts branching factors based on token uncertainty.
It employs a piecewise-linear mapping from normalized entropy to compute allocation, achieving up to a 50% reduction in expansions with enhanced accuracy.
Empirical evaluations across tasks like GSM8K and code generation validate EDEN’s theoretical guarantees and superiority over fixed-width beam search.

Entropy-informed Decoding: Adaptive Information-Driven Branching

Overview

The paper "Entropy-informed Decoding: Adaptive Information-Driven Branching" (2605.09745) presents EDEN, a principled, model-agnostic decoding framework for LLMs that allocates computational resources adaptively according to model uncertainty at each token generation step. The core innovation is the monotonic mapping of the normalized entropy of the next-token distribution to the branching factor, shifting computation dynamically to points of maximal uncertainty, thus approximating high-width beam search with improved efficiency.

Motivation and Limitations of Prior Decoding Approaches

Traditional decoding strategies for LLMs such as greedy decoding, sampling-based methods (top-k, nucleus, min-p), and search-and-select approaches (beam search, best-of-n, majority voting) each suffer from specific limitations. Sampling-based approaches often make irreversible token selections and neglect plausible low-probability continuations; search-based methods allocate compute uniformly across all generation steps, neglecting the variability in reasoning complexity and leading to unnecessary computational overhead in simple contexts.

Recent advances in entropy-informed branching (e.g., entropy-aware model switching, entropy thresholding, entropy-bounded sampling), while leveraging information-theoretic measures, primarily rely on heuristic or binary decision mechanisms. They lack a theoretically grounded and continuous way to adapt allocation. The paper introduces a monotone allocation rule justified by formal regret analysis, explicitly tying branching to stepwise entropy and providing guarantees on sample complexity for entropy estimation, enabling robust adaptation even in closed-source or API-limited scenarios.

Methodology

EDEN estimates the normalized entropy $H_t$ of the next-token distribution at each generation step. The branching factor $B_t$ is determined as a piecewise-linear function $B_t = \max(1, B_{\text{max}} \cdot H_t)$ , where $B_{\text{max}}$ is a hyperparameter ceiling. Tokens with low entropy invoke nearly greedy decoding, while high-entropy steps trigger broader exploration, effectively allocating compute to reasoning forks. The process incorporates a theoretically motivated pruning criterion using admissible upper/lower bounds on normalized sequence scores, further improving efficiency.

Theoretical Results

The paper establishes that entropy-monotone branching reduces cumulative decision regret relative to any fixed-width beam search—formally, expected regret $E[R_T]$ decays exponentially in the stepwise compute allocation $m_t$ and the effective gap, uniformly outperforming static allocation when entropy varies across steps. Explicit cumulative regret bounds are derived under sub-Gaussian estimation, bounded rewards, and distributional Lipschitz continuity. Sample complexity for entropy estimation in closed-source settings achieves a near-optimal rate, tolerating moderate estimation error without degrading branching decisions.

Empirical Evaluation

EDEN is evaluated across tasks demanding complex, multi-step reasoning: mathematical problem solving (GSM8K, MATH-500), code generation (HumanEval), and scientific QA (SciBench), using models from multiple families (Llama-3.2-3B-Instruct, Gemma-3, IBM Granite, Mistral-7B). Comparisons span greedy, sampling-based, and search-based methods with standardized hyperparameters and compute budgets.

EDEN consistently achieves the best average rank in accuracy while using substantially fewer expansions than fixed-width beam search. Bayesian hierarchical analysis confers a 75% posterior probability for EDEN being the best overall method, with strong dominance over alternative baselines (≥96% pairwise probability), matching or exceeding beam search accuracy with fewer expansions ( $\approx 50\%$ reduction). Relative accuracy improvements range from $+2\%$ to $+11\%$ absolute versus sampling and majority voting. Token efficiency is demonstrated by shifting the accuracy-expansion Pareto frontier upward, corroborating theoretical claims.

Dynamic allocation is empirically validated: without access to task difficulty labels or reward feedback, EDEN automatically increases expansions for more complex tasks (e.g., from GSM8K to SciBench), demonstrating the effectiveness of entropy as a proxy for step hardness. Entropy estimation under API-limited settings is shown to be robust; EDEN applied to top- $k$ subsets of logits yields accuracy gains over both greedy and top- $B_t$ 0 sampling, with estimation accuracy stabilizing for reasonable $B_t$ 1 and expansion thresholds.

Robustness and Limitations

EDEN is robust to prompt paraphrasing and moderate model miscalibration. The shape of model confidence—reflected in entropy—is preserved under typical perturbations and temperature scaling, ensuring stable branching decisions. The framework is agnostic to the base scorer, as long as additive, bounded per-step rewards and distributional Lipschitz continuity are maintained; extensions to reward-augmented decoding are demonstrated with process reward models.

Limitations include the added cost of entropy computation at each step, heterogeneous branching rates complicating batching and parallelization, and restricted experiments on larger models due to compute constraints. The approach assumes that output distributions encode meaningful uncertainty; poorly calibrated or undertrained LLMs may not benefit from entropy-adaptive allocation.

Implications and Future Directions

EDEN formalizes adaptive compute allocation in search-based decoding, providing provable advantages over conventional fixed-width strategies with minimal modifications to inference procedures. Practically, this enables more efficient usage of computational resources in LLM deployment by concentrating search on ambiguous points, enhancing generation quality in zero-shot, complex reasoning settings, and improving efficiency for API-constrained or expensive inference environments.

Theoretically, monotone entropy-based allocation advances the understanding of stepwise uncertainty in sequential generation, offering a tractable proxy for computational hardness and facilitating plug-and-play integration. Future research could integrate EDEN with custom scoring, diversity-guided objectives, external reward models, or hierarchical branching for tool invocation and semantic abstraction, broadening its applicability in agentic LLMs and complex, multi-modal reasoning tasks.

Conclusion

The paper introduces EDEN, achieving adaptive, information-driven branching in LLM decoding, substantiated by theoretical guarantees and empirical superiority over existing methods in complex reasoning tasks. By leveraging entropy to guide search allocation, EDEN sets a new standard for efficient, principled decoding in generative language modeling, underpinning practical and theoretical advancements in the field.

Markdown Report Issue