Identify Dream model characteristics causing diminished RL policy performance
Determine which characteristics of the Dream-7B-Instruct diffusion language model contribute to the diminished performance of reinforcement learning-trained unmasking policies for masked diffusion sampling.
Sponsor
References
Understanding which characteristics of Dream (e.g., it being initialized from an AR model) contribute to the diminished performance of RL policies is an important open question.
— Learning Unmasking Policies for Diffusion Language Models
(2512.09106 - Jazbec et al., 9 Dec 2025) in Conclusion — Limitations and future work