Seed Diffusion Preview for Code Generation

Updated 5 August 2025

Seed Diffusion Preview is a discrete diffusion-based model that leverages block-wise parallel token generation to achieve high inference speeds of 2,146 tokens/s.
It employs a two-stage curriculum with mask-based and edit-based corruption to ensure robust denoising and maintain competitive code generation quality.
The model sets a new state of the art on the speed–quality Pareto frontier, outperforming competitors like Mercury Coder through optimized system design and algorithmic innovations.

Seed Diffusion Preview refers to a discrete diffusion-based large-scale LLM specifically tailored for code generation with an emphasis on high-throughput, non-sequential token sampling. It leverages parallel token generation to achieve substantial inference speedups over conventional autoregressive models while maintaining generation quality on standard coding benchmarks. The approach integrates multiple innovations in diffusion curriculum training, block-wise decoding, and optimization infrastructure, positioning it at the advanced frontier of speed-quality trade-offs in code generation.

1. Discrete Diffusion Model Architecture

Seed Diffusion Preview utilizes a dense Transformer architecture, as standard in LLMs, but is trained via a discrete diffusion process instead of autoregressive next-token prediction. At its core, the model operates by:

Defining a forward process that starts from a clean token sequence and progressively corrupts it either via:
- Mask-based corruption (replacing tokens with a [MASK] symbol) for the initial 80% of training.
- Edit-based corruption (random insertions, deletions, substitutions controlling the Levenshtein signal-to-noise ratio) for the final 20% of training, to promote robust self-correction during denoising.
Training a reverse process (the actual model) to reconstruct the uncorrupted sequence by iterative denoising.
The curriculum—termed "two-stage curriculum" (TSC, Editor's term)—ensures coverage over both realistic denoising and self-correction behaviors, crucial for the stability and efficacy of diffusion-based LLMs.

Mathematically, the loss during constrained-order training minimizes: $L_c(\theta) = \mathbb{E}_{\tau\sim U(\mathcal{T}), (x_i, x_0)\in\tau} -\lambda(x_i)\log p_\theta(x_0\mid f(x_i))$ where $\lambda(x_i)$ weights the loss by noise level, and $f$ is the corruption function.

2. Parallel, Block-Level Token Generation

The model discards traditional left-to-right decoding in favor of block-wise parallel sampling. Generation begins from a fully masked sequence; in each diffusion step, a block of tokens ( $B_n$ ) is generated in parallel, conditioned on previous context blocks. This "semi-autoregressive" blockwise approach maintains causality across blocks but yields substantial acceleration because within a block, all tokens are produced simultaneously.

This non-sequential decoding architecture contrasts: $p_\theta(x) = \prod_{i=1}^{|x|}p_\theta(x_i\mid x_{<i})$ (autoregressive) with a diffusion-based joint probability assigned via the learned reverse Markov chain trajectory.

3. Inference Speed and System Infrastructure

A notable outcome is the inference rate of 2,146 tokens/s on H20 GPUs. Several system-level and algorithmic factors underlie this speed:

Large block size during decoding amortizes the cost per forward pass, as computation for expanding a partial sequence to a larger block remains nearly constant compared to a full pass for each token.
On-policy diffusion learning with a Monte Carlo gradient estimator is applied to reduce the expected number of diffusion steps during test-time, further limiting decoding time.
The implementation is extensively tuned for parallel batch processing, minimizing fetch and recomputation overhead in GPU infrastructure.

Increasing block size directly reduces the relative forward-pass "shadow cost," as evidenced in the ablation presented in Figure 1(b) of the original work, resulting in an order-of-magnitude speedup over standard token-by-token approaches.

4. Benchmark Performance and Quality

The model is evaluated over a suite of open code generation and editing benchmarks, including Aider, CanItEdit, MBXP, and NaturalCodeBench. Across these, Seed Diffusion Preview:

Achieves pass@1 and other standard generation metrics that are highly competitive with leading large model baselines, including CodeLlama, DeepSeek, and CodeQwen.
Demonstrates strong edit performance on program editing tasks (Aider, CanItEdit) comparable to much larger autoregressive models.
Establishes that as block size increases, minor quality trade-offs are possible, but the performance remains within a narrow margin of the highest-quality baselines.

The core learning objective, based on maximizing the ELBO, incorporates both the log-likelihood of the ground-truth sequence and a token-level reconstruction term: $L_{\text{ELBO}} = -\log p_\theta(x_0|x_0) - \mathbb{E}_{q_{\text{mask}, t}} \left[\frac{\gamma'_t}{\gamma_t} \sum_{i=1}^{|x_0|} \mathbf{1}\{x_t[i]=[\text{MASK}]\} \log p_\theta(x_0[i] | x_t[i])\right]$ This supports robust sentence-level and token-level reconstruction.

5. Speed–Quality Pareto Frontier

Seed Diffusion Preview establishes a new state of the art on the speed–quality Pareto frontier for code models. The Pareto frontier in this context is the trade-off boundary beyond which it is impossible to increase decoding speed without sacrificing output quality, or vice versa. Specifically:

It achieves code generation quality scores nearly as high (and sometimes higher) than the best autoregressive baselines, while offering a dramatically higher generation rate (by measured tokens/s).
Competing discrete diffusion models, such as Mercury Coder and Gemini Diffusion, are outperformed both in speed and sometimes in quality as measured on open evaluation benchmarks run on H20 GPUs.

This distinction is reinforced by the direct head-to-head evaluations, where Mercury and Gemini, while also block-sampling based, trail in measured inference throughput and, in some metrics, generation fidelity.

6. Implications for Large-Scale Code Generation

Seed Diffusion Preview demonstrates that discrete diffusion models are a viable and effective alternative to autoregressive Transformers for code generation at scale. The combination of parallel decoding and competitive program synthesis quality facilitates the deployment of high-throughput code LMs in environments where latency is paramount. System-level co-design alongside model-level algorithmic improvements appears essential for achieving such speeds without compromising sample quality.

A clear implication is that for code—where accuracy and editability are critical—diffusion-based models can match or exceed the practical capabilities of traditional LLMs, provided block-wise strategies and robust curricula are employed.

7. Comparative Analysis: Mercury Coder and Gemini Diffusion

Explicit comparison in the original research is as follows:

Model	Hardware	Reported Inference Speed (tokens/s)	Benchmarks	Relative Quality
Seed Diffusion Preview	H20 GPUs	2,146	8 open code tasks	Competitive/highest per test
Mercury Coder	H100 GPUs	Lower (exact not specified in detail)	Proprietary	Competitive (fewer tasks open)
Gemini Diffusion	Unclear	Lower (averaged, unclear batch)	Mixed-task	Close but not higher

All metrics and claims in this table are directly reflected in the underlying results; further, hardware and dataset differences are explicitly noted in the assessments.

Conclusion

Seed Diffusion Preview advances the field of diffusion-based language modeling for code by offering an efficient block-level sampled Transformer architecture, trained with innovative two-stage curriculum and order constraints. Its measured inference speed of 2,146 tokens/s, paired with competitive or superior benchmark performance, marks the current state of the art on the speed–quality Pareto frontier relative to models such as Mercury Coder and Gemini Diffusion (Song et al., 4 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference (2025)

Follow Topic

Get notified by email when new papers are published related to Seed Diffusion Preview.