Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models

Published 26 Nov 2025 in cs.CL | (2512.02044v1)

Abstract: Diffusion LLMs (DLMs) have recently achieved significant success due to their any-order generation capabilities. However, existing inference methods typically rely on local, immediate-step metrics such as confidence or entropy which inherently lack a more reliable perspective. This limitation frequently leads to inconsistent sampling trajectories and suboptimal generation quality. To address this, we propose Coherent Contextual Decoding (CCD), a novel inference framework built upon two core innovations. First, CCD employs a trajectory rectification mechanism that leverages historical context to enhance sequence coherence, enabling the early rejection of suboptimal paths. We demonstrate that this mechanism is theoretically equivalent to modeling the consistency of historical steps via the conditional mutual information between context and token predictions. Building on this theoretical insight, we further address the inefficiency of conventional uniform decoding budgets. Instead of rigid allocations based on diffusion steps, we introduce an adaptive sampling strategy that dynamically adjusts the unmasking budget for each step according to our consistency metric. Consequently, our method significantly improves the quality of generation trajectories while accelerating the sampling process. Empirically, our method achieves a simultaneous enhancement in both inference speed and performance across diverse benchmarks on Dream and LLaDA, delivering up to 3.48x speedup alongside 3.91% performance improvement.

Abstract PDF Upgrade to Chat

Summary

The paper proposes a CCD framework that addresses suboptimal DLM decoding by using a trajectory rectification mechanism based on conditional mutual information.
It introduces an adaptive sampling strategy that dynamically allocates decoding budgets, achieving up to a 3.48× speedup and a 3.91% performance boost.
Experimental results on tasks like GSM8K, MATH, HumanEval, and MBPP validate the method’s efficiency improvements and robustness.

Adaptive and Coherent Decoding for Diffusion LLMs

Overview of Diffusion LLMs

Diffusion LLMs (DLMs) represent a significant advancement over traditional autoregressive models due to their ability to leverage bidirectional contexts, incorporating information from all positions simultaneously rather than being restricted to unidirectional contexts. This architectural shift potentially enhances global awareness and long-term planning capabilities of LLMs. Unlike autoregressive models, DLMs generate semantically meaningful tokens iteratively from input mask tokens based on previously decoded contexts. However, conventional methods for DLM inference often rely on immediate-step metrics such as confidence or entropy, which can lead to inconsistent sampling trajectories and suboptimal generation quality.

Recent innovations in DLMs have focused on expanding their capabilities with models such as Dream and LLaDA, which operate on billions of parameters and can match the performance of autoregressive models, delivering better long-term planning. Despite these advancements, the inference process, particularly the sampling procedure, remains an area with significant potential for optimization.

Coherent Contextual Decoding Framework

The Novel Approach

The paper introduces Coherent Contextual Decoding (CCD), a framework specifically designed to address the deficiencies in current DLM inference methods. CCD employs a trajectory rectification mechanism, theoretically grounded in the modeling of historical steps as conditional mutual information between context and token predictions. This mechanism allows for the early rejection of suboptimal paths and enhances sequence coherence. Additionally, it challenges the inefficiency of conventional uniform decoding budgets by introducing an adaptive sampling strategy. This strategy dynamically adjusts the decoding budget based on the consistency metric, thus improving the quality of generation trajectories and accelerating the sampling process.

Figure 1: Framework of our proposed method. We define a historical buffer $\mathcal{H}_t$ at iteration $t$ that stores the predictive distributions from the most recent $d$ iterations (except for the current iteration $t$ ) with only the top- $V$ most confident tokens at each iteration.

Methodological Insights

The CCD framework redefines DLM inference as a contextual consistency-aware, multi-step decoding process. The adaptive sampling component of the framework is particularly notable, as it replaces rigid diffusion steps with dynamic budget allocations guided by the consistency metric. This approach not only theoretically connects the single-step predictive distribution with the sampling error bound but also enhances performance across various benchmarks, delivering up to a 3.48× speedup and a 3.91% performance improvement.

In terms of implementation, the use of a sliding-window historical buffer is central to managing predictive distributions efficiently. This buffer retains only the most recent and informative predictions, automatically filtering out noise from early diffusion steps. This results in a computationally efficient process, as it avoids the extensive memory overhead typically associated with full distribution storage.

Experimental Evaluation

Setup and Results

The proposed CCD framework was evaluated using prominent DLMs such as LLaDA and Dream across several benchmarks, including mathematical reasoning (GSM8K and MATH), code generation (HumanEval and MBPP), and planning tasks (Trip Plan). The experiments demonstrated a significant enhancement in performance metrics across all benchmarks when utilizing the CCD method. Notably, CCD achieves robust compatibility with the block-wise decoding scheme of the LLaDA series on GSM8K and MATH benchmarks.

Further optimization was achieved with the context-dynamically varied sampling strategy (CCD-DS), which delivered significant inference speedup without compromising accuracy. For example, in the Dream model, efficiency improved by up to 3.78× on MBPP, alongside maintaining or enhancing performance scores.

Figure 2: Hyperparameter analysis showing the trade-off between score and computational steps as buffer size varies.

Implications and Future Directions

The development of CCD represents a meaningful step forward in the optimization of DLM inference processes, providing a framework that can be integrated with existing methods to improve both efficiency and accuracy. The ability to dynamically adjust the sampling strategy based on contextual consistency allows for more intelligent resource allocation during inference, potentially reducing computational demands and improving speed.

Future research could explore expanding the capabilities of CCD further, potentially integrating more sophisticated mutual information metrics or exploring its applicability to multilingual or multimodal models. Additionally, deeper investigations into how CCD can be applied across varied architectures and tasks could yield broader applicability and enhance its robustness and versatility in real-world deployment scenarios.

Conclusion

The paper's exploration of adaptive and coherent decoding for diffusion LLMs via the CCD framework is a substantial contribution to the ongoing efforts to refine LLM inference processes. By grounding the approach in robust theoretical insights and demonstrating empirical successes across diverse benchmarks, CCD offers a promising pathway for future enhancements in AI processing efficiency and effectiveness.