Homogenization of distant [MASK] prediction distributions
Determine whether, in mask diffusion language models, the model’s predicted marginal distributions over the token vocabulary at [MASK] positions become almost identical at sufficiently large distances when the sequence of [MASK] tokens is infinite, and whether, for a fixed sequence length, such near-identical distributions appear in the middle positions of the sequence.
References
Conjecture. At sufficiently large distances with an infinite length of [MASK] tokens, the distributions become almost identical. With a fixed given length, this near-identical behavior appears in the middle parts of the sequence.
— Why mask diffusion does not work
(2510.03289 - Sun et al., 29 Sep 2025) in Conjecture, Homogenization of Distant Mask Predictions (within Section 3.2.3: Marginal Distributions as a Function of Distance, following the table)