Information stored in future-position attention sinks in diffusion language models
Determine what type of information is encoded by attention sink tokens that correspond to future positions during the iterative denoising process in masked discrete diffusion language models with bidirectional attention, such as those studied in this work, to clarify the role and content of these future-position sinks.
References
While our empirical analysis offers a general overview of sink behaviour in DLMs, it also raises several open questions. First, it remains unclear what type of information the model stores in the sinks that correspond to future positions.
— Attention Sinks in Diffusion Language Models
(2510.15731 - Rulli et al., 17 Oct 2025) in Section Future Work