Source and aggregation of residual contextual signals in block-wise diffusion LLMs

Determine the appropriate source of residual contextual signals in block-wise diffusion large language models—whether from token predictions, full predictive distributions, or hidden states—and establish a principled aggregation mechanism for injecting these signals across denoising iterations without destabilizing decoding.

Background

The paper highlights that state-of-the-art block-wise diffusion LLMs use remasking to discard low-confidence tokens at each denoising step, which wastes computation and loses informative intermediate distributions. The authors propose Residual Context Diffusion (RCD) to recycle these discarded signals by converting them into residual context and re-injecting them into subsequent steps.

Before introducing RCD’s entropy-weighted residual mechanism and two-stage training, the authors explicitly note that it is unclear what the best source of residual signals should be (e.g., token outputs, distributions, hidden states) and how these signals ought to be aggregated, motivating their design choices.

References

It is unclear where residual contextual signals should come from and how they should be aggregated.

Residual Context Diffusion Language Models  (2601.22954 - Hu et al., 30 Jan 2026) in Section 1 (Introduction)