Stable injection of residual context into diffusion LLM input embeddings

Develop a method to inject soft-token residual context into the input embeddings of diffusion large language models that preserves the discrete masking scheme and avoids unstable recursive dependencies during training.

Background

Soft tokens can represent mixtures of vocabulary embeddings and thereby carry fine-grained contextual information prior to commitment. However, directly adding such soft tokens to the input embeddings of diffusion LLMs can distort the masked denoising setup and create unstable recursive dependencies.

The authors state this challenge explicitly as an open question and then propose RCD, which constructs residual vectors using the model’s embedding codebooks and injects them with entropy-based weighting to stabilize training and inference.

References

However, naively adding the soft tokens to input embeddings disrupts the discrete masking scheme dLLMs rely on and creates unstable recursive dependencies during training, leaving injecting residual context into dLLMs an open question for RCD to solve.

Residual Context Diffusion Language Models  (2601.22954 - Hu et al., 30 Jan 2026) in Section 2.2 (Soft Tokens)