Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Discrete Diffusion in Large Language and Multimodal Models: A Survey (2506.13759v2)

Published 16 Jun 2025 in cs.LG and cs.AI

Abstract: In this work, we provide a systematic survey of Discrete Diffusion LLMs (dLLMs) and Discrete Diffusion Multimodal LLMs (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey

Summary

  • The paper demonstrates that discrete diffusion models can enhance inference speed by up to ten times compared to autoregressive models.
  • The paper details innovative techniques such as parallel decoding and dynamic masking, which improve controllability and performance.
  • The paper outlines future challenges in training infrastructure and privacy, paving the way for responsible AI deployment.

Discrete Diffusion in Large Language and Multimodal Models: A Survey

The survey "Discrete Diffusion in Large Language and Multimodal Models: A Survey" embarks on an extensive analysis of Discrete Diffusion LLMs (dLLMs) and Discrete Diffusion Multimodal LLMs (dMLLMs), elucidating their nuances through comparison with autoregressive counterparts. The paper sheds light on the transformative potential of discrete diffusion models, noting their capacity to elevate inference speed by up to ten times over traditional autoregressive models. This marks a significant milestone in AI advancements, positing discrete diffusion as a promising alternative approach given its parallel generation mechanisms and improved output controllability.

The paper is anchored in two major domains that have catalyzed the advancements of dLLMs and dMLLMs: the prolific development of autoregressive models that laid down a treasure trove of data sets, benchmarks, and foundational architectures ripe for transitioning into discrete diffusion spaces. Secondly, the evolution of mathematical models that offered a shift from continuous-space to the prevalent discrete-space modalities based on absorbing states demonstrates this transition, making discrete diffusion models more scalable and optimized for engineering. This adjustment has set in motion a wave of research in early 2025, exploring the d(M)LLMs with enthusiasm and prioritized effort.

The authors further delve into the technical anatomy of discrete diffusion models, explaining how they factor into the paradigms of large discrete spaces pertinent to LLMs, and multimodal iterations that accommodate visual input. The survey identifies that improvement in distinct areas such as parallel decoding, dynamic perception, and generation controllability have been principal drivers behind the increasing favorability of discrete diffusion models.

Key experimental results bring to light robust performance metrics, which, according to the survey, often rival existing autoregressive LLMs and MLLMs, aiding in validating the real-world viability of these models. This is exemplified by empirical results showing the impressive accelerative capacity of discrete diffusion models along with matching or near equal performance levels on various benchmarks designed for language, vision-language, and biological domains.

The paper meticulously categorizes existing models, mapping out a clear timeline and progress trajectory for dLLMs and dMLLMs. It spans underlying mathematical frameworks, influential models and techniques, culminating in discussions on emerging and potential expansions into varied applications. Notably among these developments is the reconciliation of training and inference strategies which integrate innovative masking scheduling, remasking, and guidance techniques that bolster fine-tuning and inference flexibility.

Towards the conclusion, the authors outline future directions and challenges such as enhancing training infrastructure and tackling inference efficiency, proposing pathways through advanced architectural designs and deployment practices. Notwithstanding security concerns surrounding privacy issues potentially arising from memory retention and unwanted reproduction of training data. The authors propose that resolving these facets is crucial for responsible deployment.

In summation, the survey establishes discrete diffusion models as not just an alternative, but as a capable contender in the arena of language and multimodal model development, suggesting an esteemed potential that could reshape utilization scales in forthcoming AI applications. This compendium serves as an indispensable resource for continued exploration, understanding, and enhancement in the field of discrete diffusion modeling.

Github Logo Streamline Icon: https://streamlinehq.com