2000 character limit reached
Training Optimal Large Diffusion Language Models
Published 28 Sep 2025 in cs.LG, cs.AI, and cs.CL | (2510.03280v2)
Abstract: We introduce Quokka, the first systematic scaling law for diffusion LLMs (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.