Uniform Noise Diffusion Models

Updated 1 April 2026

Uniform noise diffusion models are defined by using bounded, zero-mean, unit variance uniform noise instead of Gaussian noise in both forward and reverse diffusion processes.
They achieve theoretical parity with Gaussian models through careful moment matching and discretization, as evidenced by comparable FID and NLL metrics.
Empirical studies demonstrate that these models perform competitively in image and text domains, particularly benefiting data-bound and compute-efficient applications.

Uniform noise diffusion models generalize classical diffusion-based generative modeling by introducing uniformly distributed, zero-mean, unit-variance noise in the forward or reverse diffusion process. Unlike conventional formulations that assume Gaussian (normal) noise increments, uniform noise diffusion models explore the use of uniform (or, more broadly, discrete or bounded) noise distributions in both continuous and discrete domains, including images and text. This paradigm is supported by theoretical analysis in the infinitesimal-step limit, algorithmic frameworks for practical discretizations, and empirical studies on model quality, efficiency, and scaling.

1. Mathematical Foundations and SDE Formulation

Continuous-time diffusion models are classically defined via stochastic differential equations (SDEs), most notably the variance-preserving SDE (VP-SDE): $dX_t = -\frac{1}{2} \beta(t)\, X_t\, dt + \sqrt{\beta(t)}\, dW_t$ where $\beta(t)$ is a non-decreasing schedule and $W_t$ is Brownian motion. The reverse-time SDE, when the score function is approximated by a neural network $\epsilon_\theta$ , takes the form

$d\overline{X}_t = \left( \frac{\beta(t)}{\sigma(t)} \epsilon_\theta(\overline{X}_t, t) - \frac{1}{2} \beta(t) \overline{X}_t \right) dt + \sqrt{\beta(t)}\, d\overline{W}_t$

with $\sigma(t) = \sqrt{\beta(t)}$ .

Discretization is typically performed using Euler–Maruyama (EM) with $T$ steps, step size $h = 1/T$ . For standard Gaussian noise, the EM update is: $X_{k-1} = X_k - b(X_k, \tau_k) h - \tilde{\sigma}(\tau_k) \Delta W_k \quad \text{with}\ \Delta W_k \sim \mathcal{N}(0, h)$ where $b$ and $\beta(t)$ 0 encode the drift and scaled diffusion coefficients as functions of time. In uniform noise models, the Brownian increment is replaced with

$\beta(t)$ 1

ensuring $\beta(t)$ 2 and $\beta(t)$ 3, matching the moments of the Gaussian case. The update becomes: $\beta(t)$ 4 Strong-error analysis using Grönwall’s inequality yields a convergence rate of $\beta(t)$ 5 under Lipschitz/regularity assumptions, demonstrating theoretical equivalence to Gaussian-driven models under discretization granularity (Choi et al., 10 Jun 2025, Li, 2024).

2. Theoretical Properties: Invariance and Moment Matching

The reverse-time SDE is theoretically invariant to the noise law in the infinitesimal (continuous-time) regime, provided the noise has zero mean and unit variance. This invariance arises because, as the step size approaches zero, the cumulative effect of i.i.d. increments (by the central limit theorem) converges to Brownian motion, regardless of whether the increments are Gaussian, uniform, or other location-scale distributions with bounded moments: $\beta(t)$ 6 Thus, the choice of uniform increments $\beta(t)$ 7 leads, in the limit, to the same Itô SDE as in the Gaussian case (Li, 2024).

In practice, it is essential to match the first two moments (mean zero, variance one) and keep the support bounded (e.g., $\beta(t)$ 8 for uniform) to preserve theoretical guarantees and avoid error amplification. Empirical evidence and ablation studies confirm that incorrect scaling (e.g., variance mismatch) significantly degrades generation quality and sample diversity (Choi et al., 10 Jun 2025).

3. Training Objectives and Algorithmic Formulations

For continuous data, the usual evidence lower bound (ELBO) or denoising score-matching loss can be adapted to uniform (or other) noising kernels by considering KL divergences of uniform vs. Gaussian (or other distributions as desired). For EM discretizations with uniform increments, the per-step loss becomes: $\beta(t)$ 9 with $W_t$ 0 unchanged from the Gaussian case and $W_t$ 1 (Li, 2024).

For non-Gaussian noising, score-based loss formulations may be unavailable or intractable due to the lack of explicit reverse kernels $W_t$ 2; in these cases, method-of-moments objectives are used, matching the network output to the true noise mean and variance (Jolicoeur-Martineau et al., 2023).

For discrete diffusion models (including text), the USDM and GIDD frameworks define the forward noising process as a convex combination of the identity and a uniform categorical distribution over the vocabulary, with a uniform prior at $W_t$ 3: $W_t$ 4 Loss design leverages a per-token cross-entropy for truly noised positions and may integrate contrastive-inspired negative sampling for additional training robustness (Zhu et al., 27 Oct 2025).

4. Empirical Behavior and Comparative Performance

Quantitative results establish that, for image generation, sample quality as measured by FID is virtually unchanged between uniform, Rademacher, and Gaussian increments, provided exact moment-matching is enforced. On MNIST (T=500), FID scores are: Gaussian ≈ 2.99, Uniform ≈ 2.97, Rademacher ≈ 2.99, with similar results for triangular and arcsine distributions. Laplace (asymmetric) noise degrades markedly (FID ≈ 5.77), and incorrect scaling produces severe artifacts (Choi et al., 10 Jun 2025).

In density modeling on CIFAR-10, uniform-forward with a Gaussian reverse achieves near-identical FID (1.99 vs. Gaussian’s 1.98) but increases NLL by ≈0.3 bits/dim, confirming theoretical predictions about distributional differences beyond the second moment (Li, 2024). In contrast, employing uniform noise directly in the reverse process can lead to suboptimal results due to boundary effects and non-smooth score functions, with observed catastrophic degradation (FID > 200) in location-scale GDDIM models (Jolicoeur-Martineau et al., 2023).

For language modeling, uniform-state and GIDD-based discrete diffusion models with uniform noising exhibit stable scaling, competitive generative perplexity (Gen PPL), and improved token efficiency in data-bound regimes. USDMs with simplified loss match or exceed prior masked diffusion approaches on Gen PPL while offering stable and efficient few-step generation (Zhu et al., 27 Oct 2025, Rütte et al., 11 Dec 2025).

5. Practical Implementation, Sampling, and Complexity

The uniform noise models preserve the computational architecture of Gaussian DMs. Sampling involves identical algorithmic structure, except the noise sampling step uses i.i.d. uniform random numbers: $\epsilon_\theta$ 2 Per-step complexity is $W_t$ 5, and the overall complexity to reach strong error $W_t$ 6 is $W_t$ 7 (Choi et al., 10 Jun 2025).

For discrete (text) settings, the denoiser is trained only on actually noised positions—cross-entropy for mismatched tokens—yielding both stability and performance at scale without ELBO computation. Uniform diffusion in LLMs is compatible with standard Transformer backbones and parallelizable architectures, with scaling laws supporting strong convergence at large parameter counts (Zhu et al., 27 Oct 2025, Rütte et al., 11 Dec 2025).

6. Scaling Laws, Hyperparameters, and Regimes of Efficacy

Scaling analyses in large discrete diffusion LLMs reveal that uniform diffusion, while requiring more parameters to match masked-diffusion or autoregressive performance at fixed compute, becomes more token-efficient in data-bound regimes: $W_t$ 8 with loss decreasing as $W_t$ 9 (Rütte et al., 11 Dec 2025). Optimal batch size and learning rate scale sublinearly with data and model size, and uniform diffusion benefits more from parameter scaling than masked or AR models.

Uniform-structure in the forward process introduces greater recoverability difficulty (no token identity bias), requiring increased model capacity. However, this is offset by greater supervision per token and practical gains in fully parallel, flexible generation.

7. Limitations, Boundary Effects, and Comparative Analysis

Uniform noise, by virtue of bounded support and low differential entropy, introduces sharper boundaries in the forward process, which complicates the reverse mapping. Whenever the reverse process must estimate densities with undefined or zero gradients (as in uniform), learning becomes less stable and empirical sample quality degrades relative to Gaussian-noise models (Jolicoeur-Martineau et al., 2023). In the infinitesimal step ( $\epsilon_\theta$ 0) regime, central limit effects mitigate these disadvantages, and convergence to Brownian behavior is recovered (Li, 2024).

In summary, uniform noise diffusion models:

Achieve the same $\epsilon_\theta$ 1 strong error as Gaussian models under standard regularity and moment-matching,
Can be reliably and efficiently implemented in both image and discrete (text) diffusion models,
Offer near-equal or sometimes superior empirical performance in specific domains—especially in compute- or data-bound language modeling regimes,
Should avoid improper scaling or misuse in low-step or location-scale reverse settings to sidestep qualitative degeneration of sample quality (Choi et al., 10 Jun 2025, Jolicoeur-Martineau et al., 2023, Rütte et al., 11 Dec 2025).