Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 88 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 17 tok/s Pro

GPT-5 High 17 tok/s Pro

GPT-4o 73 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Kimi K2 190 tok/s Pro

2000 character limit reached

DPad: Efficient Diffusion Language Models with Suffix Dropout (2508.14148v2)

Published 19 Aug 2025 in cs.CL and cs.LG

Abstract: Diffusion-based LLMs (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose Diffusion Scratchpad (DPad), a training-free method that restricts attention to a small set of nearby suffix tokens, preserving fidelity while eliminating redundancy. DPad integrates two strategies: (i) a sliding window, which maintains a fixed-length suffix window, and (ii) distance-decay dropout, which deterministically removes distant suffix tokens before attention computation. This simple design is compatible with existing optimizations such as prefix caching and can be implemented with only a few lines of code. Comprehensive evaluations across multiple benchmarks on LLaDA-1.5 and Dream models demonstrate that DPad delivers up to $\mathbf{61.4\times}$ speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference. Our code is available at https://github.com/Crys-Chen/DPad.

Collections

Summary

The paper introduces DPad which leverages suffix dropout to reduce computational redundancy in diffusion language models while maintaining output accuracy.
It employs a sliding window and distance-decay dropout strategy, achieving up to 61.4× speedup on benchmarks such as LLaDA-1.5.
DPad integrates seamlessly with existing architectures without requiring retraining, highlighting its potential for scalable and efficient LLM deployment.

DPad: Efficient Diffusion LLMs with Suffix Dropout

Overview

This paper introduces the Diffusion Scratchpad (DPad), an efficient approach for diffusion-based LLMs (dLLMs) that leverages suffix dropout to reduce computational redundancy. The core premise of DPad is to streamline the denoising process intrinsic to dLLMs by focusing on a limited subset of suffix tokens that act as a "scratchpad," thereby reducing computational overhead without sacrificing model accuracy.

Methodological Insights

Diffusion LLMs (dLLMs)

Unlike conventional autoregressive models, dLLMs eliminate sequential dependencies by framing text generation as a parallel denoising process. While this approach allows for parallel token generation, it incurs high computational costs due to the redundant prediction of suffix tokens that do not contribute significantly to the output.

Scratchpad Mechanism

Suffix tokens in dLLMs serve as an information reservoir, collecting signals from prefix tokens. This paper likens the function of these tokens to a "scratchpad," providing contextual cues that assist in generating the current block. The redundancy observed in suffix tokens increases with their distance from the current block.

DPad: Efficiency Enhancements

DPad proposes two strategies to efficiently utilize suffix attention:

Sliding Window: This maintains a fixed-length suffix window, ensuring only nearby suffix tokens are considered, thus bounding the computational effort required.
Distance-decay Dropout: This strategically prunes distant suffix tokens using a gaussian sampling process before computing attention scores, thereby reducing unnecessary calculations.

Both strategies complement existing optimization techniques, such as prefix caching, to deliver substantial performance improvements.

Figure 1: Comparison of (a) autoregressive LLMs, (b) block-wise diffusion LLMs, and (c) our DPad. DPad restricts suffix attention via: (i) Sliding Window.

Evaluative Metrics and Results

The paper evaluates DPad across several benchmarks using models like LLaDA-1.5 and Dream with notable findings:

Speed Improvements: DPad achieves up to $61.4\times$ speedup over vanilla dLLMs while maintaining comparable accuracy.
Architecture Compatibility: By integrating seamlessly with existing architectures, DPad delivers considerable efficiency without the need for retraining.
Accuracy Maintenance: Despite significant speed improvements, DPad maintains the integrity of the model's output accuracy.
Figure 2: Attention score maps illustrating the Scratchpad mechanism in dLLMs. The maps were generated by the LLaDA-1.5 model.

Implications and Future Directions

The implications of DPad are significant for the scalability and deployment of dLLMs in practical applications. Its compatibility with current optimizations makes it a potent tool for enhancing text generation efficiency. Future research could focus on integrating similar dropout strategies during the training phase to naturally align model training and inference conditions, further enhancing accuracy and efficiency.

In conclusion, DPad contributes a critical component towards efficient, scalable LLMing, mitigating one of the major bottlenecks of diffusion-based approaches by harnessing the redundant nature of suffix tokens. Future work could involve extending these dropout mechanisms to training phases, potentially offering more robust and finely tuned models.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (9)

GitHub

GitHub - Crys-Chen/DPad: Official implementation of "DPad: Efficient Diffusion Language Models with Suffix Dropout" (11 stars)

alphaXiv

DPad: Efficient Diffusion Language Models with Suffix Dropout (25 likes, 0 questions)