Cause of domain-dependent speed–accuracy tradeoff in LLaDA2.1

Determine whether the observed domain-specific variation in LLaDA2.1 decoding performance—where Speedy Mode threshold settings yield high throughput with minimal accuracy loss in structured domains such as coding and math but degrade quality in general chat—stems primarily from an inherent model preference for structured data or from distributional characteristics of the training dataset.

Background

The paper introduces a dual-threshold “Draft-and-Edit” decoding scheme with two operating configurations: Speedy Mode (lower mask threshold, relying on token-to-token edits) and Quality Mode (more conservative thresholds).

Empirical results show substantial speed variations across domains, with notably high throughput on coding benchmarks and lower throughput and stability on instruction-following tasks. The authors note a persistent speed–accuracy tradeoff and suggest domain-specific tuning of thresholds.

They explicitly conjecture that the domain-dependent behavior may be driven either by the model’s inherent preference for structured data (e.g., code, math) or by the distributional properties of the training data, and indicate that further validation is needed.

References

Our conjecture is that this pattern may be related to the model's inherent preference for structured data or the distributional characteristics of training dataset.

LLaDA2.1: Speeding Up Text Diffusion via Token Editing  (2602.08676 - Bie et al., 9 Feb 2026) in Outlook and Limitation — Tradeoff Between Inference Speed and Accuracy paragraph