Unclear effectiveness of ARM-to-MDM adaptation for building strong foundation models
Determine whether fine-tuning autoregressive language models into masked diffusion models, as in the adaptation approach of Gong et al. (2024), can produce a foundation language model whose comprehensive evaluation performance is comparable to that of strong large language models.
References
However, improvements are confined to certain metrics, and it remains unclear whether this approach can yield a foundation model comparable to strong LLMs under a comprehensive evaluation.
— Large Language Diffusion Models
(2502.09992 - Nie et al., 14 Feb 2025) in Related Work, Section 6