Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 41 tok/s Pro
GPT-5 High 39 tok/s Pro
GPT-4o 89 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models (2508.00819v2)

Published 1 Aug 2025 in cs.CL

Abstract: Diffusion LLMs (DLLMs) are emerging as a powerful alternative to the dominant Autoregressive LLMs, offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length allocation leads to a problematic trade-off: insufficient lengths cripple performance on complex tasks, while excessive lengths incur significant computational overhead and sometimes result in performance degradation. While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion LLMs. DAEDAL operates in two phases: 1) Before the denoising process, DAEDAL starts from a short initial length and iteratively expands it to a coarse task-appropriate length, guided by a sequence completion metric. 2) During the denoising process, DAEDAL dynamically intervenes by pinpointing and expanding insufficient generation regions through mask token insertion, ensuring the final output is fully developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves performance comparable, and in some cases superior, to meticulously tuned fixed-length baselines, while simultaneously enhancing computational efficiency by achieving a higher effective token ratio. By resolving the static length constraint, DAEDAL unlocks new potential for DLLMs, bridging a critical gap with their Autoregressive counterparts and paving the way for more efficient and capable generation.

Summary

  • The paper introduces DAEDAL, a training-free variable-length denoising strategy that dynamically adapts sequence lengths using model confidence signals.
  • It leverages an initial short, fully masked sequence and iterative mask insertion to refine outputs during the diffusion process.
  • Experiments on LLaDA-Instruct-8B demonstrate enhanced computational efficiency and improved response quality over fixed-length baselines.

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion LLMs

Introduction

Diffusion LLMs (DLLMs) have emerged as a significant alternative to Autoregressive LLMs, mainly due to their efficient parallel generation and comprehensive global context modeling. Yet, DLLMs face a critical constraint: the necessity for a statically predefined generation length. This limitation imposes an unsatisfactory trade-off between computational efficiency and task performance. The paper introduces DAEDAL, a novel, training-free denoising strategy aimed at Dynamic Adaptive Length Expansion for DLLMs, eliminating the need for static length predefined at inference time. Figure 1

Figure 1: Overview of DAEDAL's effectiveness on LLaDA-Instruct-8B. (a) DAEDAL uses a unified and short initial length, consistently surpassing the baseline, which needs its length meticulously tuned for each benchmark to achieve peak performance. (b) DAEDAL dynamically adjusts length and adaptively expands on a per-problem basis, resulting in a varied distribution of response lengths. In contrast, the baseline is constrained to a fixed length for all problems.

Methodology

Diffusion LLMs Overview

DLLMs leverage a diffusion-based denoising process utilizing bidirectional attention to generate text from a masked initial sequence into coherent outputs. Unlike autoregressive models, DLLMs start from a fixed-length, fully masked sequence, resulting in a rigid inference process without length adaptability. This limitation is problematic, as it prevents dynamic test-time scaling, which autoregressive models excel at.

DAEDAL Mechanism

DAEDAL consists of two primary phases:

  1. Initial Length Adjustment: It begins with a short initial length, relying on the model's confidence in predicting an End-of-Sequence (EOS) token to guide expansion. Visualization of average EOS token confidence after the first prediction on fully masked sequences forms the basis for task-specific length adequacy assessment. Figure 2

    Figure 2: Visualization of the DLLM's awareness of length sufficiency. The heatmaps show the difference in average EOS token confidence at the sequence terminus, measured after the first prediction on a fully masked 128-token input.

  2. Iterative Mask Insertion: During the denoising process, DAEDAL dynamically expands sequences at positions indicating low prediction confidence. This is an on-demand local refinement, allowing dynamic length adaptation during generation to cater to complex reasoning requirements. Figure 3

    Figure 3: Inference process of Fixed-Length Denoising (Baseline) and DAEDAL. (a) The standard inference process for current DLLMs, which performs iterative denoising on a sequence of a predefined, static length. (b) Our proposed two-stage inference process.

Experiments and Results

The experiments were conducted using LLaDA-Instruct-8B, with DAEDAL displaying a superior capability to adapt lengths dynamically compared to baseline methodologies. The performance analysis included multiple benchmarks, showing DAEDAL's capability to surpass statically fixed baselines while improving computational efficiency and effective token utilization.

The experiments conclusively demonstrated the advantage of DAEDAL in allocating task-appropriate length, as represented by a diverse length distribution, which is more aligned with task complexities than fixed-length baselines. Figure 4

Figure 4: Distribution of individual Response Lengths ($\boldsymbol{N_{token}$) on LLaDA-Instruct-8B. DAEDAL’s dynamic adaptation results in varied length distribution compared to the fixed-length baseline.

Analysis

Threshold Sensitivity

DAEDAL’s robustness was showcased through varied threshold configurations, which governs the token-level filling and sequence-level length adjustment. Even with diverse threshold settings, DAEDAL maintained high accuracy, underscoring its adaptable nature without extensive hyperparameter tuning. Figure 5

Figure 5: Ablation Results on DAEDAL's Thresholds. The heatmaps present grid search results over interdependent threshold pairs, highlighting stability across configurations.

Conclusion

DAEDAL effectively addresses the static length limitation of DLLMs, introducing a dynamic, training-free approach that leverages intrinsic model signals for length adaptability. This strategic flexibility not only enhances computational efficiency but also aligns DLLMs more closely with autoregressive capabilities for diverse generation tasks. Future avenues may explore integrating DAEDAL's framework with other generation mechanisms, fostering further advancements in language modeling.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 15 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube