Denoising Diffusion Probabilistic Modeling

Updated 3 April 2026

DDPM is a deep generative framework that iteratively adds Gaussian noise to data and then reconstructs it via a learned reverse Markov process.
The training leverages a simplified mean squared error loss by predicting the injected noise, ensuring efficient and stable optimization.
Architectural and algorithmic advances, like adaptive variance scheduling and latent diffusion, enhance performance in imaging, communications, and simulation.

Denoising Diffusion Probabilistic Modeling (DDPM) encompasses a class of deep generative models defined by a forward process of incrementally corrupting data with noise, and a learned reverse (denoising) process that reconstructs the target distribution by inverting this degradation. DDPMs have become foundational in modern unsupervised generative modeling, with strong empirical results and deep connections to score-based modeling, variational inference, and stochastic differential equations.

1. Mathematical Formulation: Forward and Reverse Processes

A DDPM defines two intertwined Markov chains on data $x_0 \sim q(x_0)$ :

The forward process (noising) iteratively adds Gaussian noise:

$q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$

where $\beta_t \in (0,1)$ is a variance schedule over $t=1,\ldots,T$ . The closed-form marginal after $t$ steps is

$q(x_t|x_0) = \mathcal{N}\left(x_t;\, \sqrt{\bar\alpha_t}\,x_0,\,(1-\bar\alpha_t) I\right),$

with $\alpha_t = 1-\beta_t$ , and $\bar\alpha_t = \prod_{s=1}^t \alpha_s$ .

The reverse process (denoising) is a parameterized Markov chain:

$p_\theta(x_{t-1}|x_t) = \mathcal{N}\left(x_{t-1};\, \mu_\theta(x_t,t),\, \Sigma_\theta(x_t,t)\right),$

where $\mu_\theta$ and potentially $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 0 are outputs of a neural network. In practice, a U-Net with time-step embedding is typically used (Ho et al., 2020, Nichol et al., 2021).

The architecture offers closed-form expressions for the true reverse process (when $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 1 is known), which forms the basis for training objectives and network parameterizations.

2. Variational Objective and the Training Loss

Training maximizes a variational bound (ELBO) on the log marginal likelihood $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 2. The ELBO can be decomposed (for Gaussian transitions) into KL divergences between the forward and learned reverse kernels: $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 3 Ho et al. demonstrated that when the mean $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 4 is parameterized via prediction of the forward noise $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 5, the principal contribution to the loss reduces to

$q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 6

where $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 7 and $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 8 is uniformly sampled (Ho et al., 2020, Turner et al., 2024). This loss is widely used due to its computational and empirical advantages.

3. Conditional and Specialized Diffusion Schemes

3.1 Conditional DDPM for Inverse Problems

In wireless transmission applications, DDPMs can be conditioned on side information. Given a degraded observation $q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\, \beta_t I\right),$ 9 (e.g., demodulated, noisy signal), the conditional forward chain is formulated as

$\beta_t \in (0,1)$ 0

with schedule $\beta_t \in (0,1)$ 1 and variance $\beta_t \in (0,1)$ 2. The conditional reverse chain similarly interpolates between $\beta_t \in (0,1)$ 3 and $\beta_t \in (0,1)$ 4 in its mean

$\beta_t \in (0,1)$ 5

using coefficients derived from the ELBO. The loss for conditional training becomes a mean squared error on a composite target derived from the generative and conditional priors (Letafati et al., 2023).

3.2 Domain-Adapted Diffusion: Rician, Manifold, Multiplicative Noise

Recent work extends the base Gaussian assumption to accommodate complex data and noise models:

Rician DDPM (RDDPM): Adapts the chain for sodium-MRI, where raw data is magnitude-reconstructed and exhibits Rician rather than Gaussian noise. A CNN is trained to map the squared Rician-magnitude to squared Gaussian latents, then a standard DDPM reverse update is applied (Yuan et al., 2024).
Manifold-constrained DDPM: Uses projection-based schemes to define diffusion and score-matching over non-Euclidean data manifolds, with losses and SDEs derived with rigorous geometric constraints. Empirically shown on $\beta_t \in (0,1)$ 6 and molecular configuration spaces (Liu et al., 7 May 2025).
Speckle DDPM (SDDPM): Applies to images corrupted by multiplicative (speckle) noise, e.g., ultrasound. The process is reformulated in the log domain, where noise is additive and the forward and reverse kernels remain Gaussian. The network directly maps the noisy input to clean, enabling efficient single-step denoising (Guha et al., 2023).

4. Algorithmic Procedure and Sampling

4.1 Training

DDPM training involves sampling $\beta_t \in (0,1)$ 7 from data, sampling a noise level $\beta_t \in (0,1)$ 8, producing $\beta_t \in (0,1)$ 9 by forward noising, then minimizing the MSE loss between noise and the network's prediction: $t=1,\ldots,T$ 8 For conditional tasks, both $t=1,\ldots,T$ 0 and $t=1,\ldots,T$ 1 are input and specialized loss forms are employed (Letafati et al., 2023).

4.2 Sampling

Sampling starts from $t=1,\ldots,T$ 2 (or conditioned initialization), then iteratively applies the learned reverse kernel: $t=1,\ldots,T$ 3 After $t=1,\ldots,T$ 4 steps, $t=1,\ldots,T$ 5 is a sample from the learned distribution (Ho et al., 2020).

5. Architectural and Algorithmic Advancements

Variance scheduling: Cosine (or learned) noise schedules improve numerical stability and sample quality over linear schedules (Nichol et al., 2021).
Learned variances: Interpolating reverse variances enables faster sampling with fewer steps, maintaining quality (Nichol et al., 2021).
Wavelet-domain DDPM: Replacing pixel-space U-Net with spatial-frequency U-Net (SFUNet) operating on discrete wavelet coefficients improves high-resolution image synthesis (Yuan et al., 2023).
Latent Diffusion Models: DDPMs applied in latent space offer up to 16×–256× computational gains with minor trade-offs in generation fidelity, crucial for large scientific datasets (Jia et al., 11 Mar 2026).
Star-Shaped DDPM: Generalizes DDPM to non-Markovian and non-Gaussian exponential-family noise processes, permitting modeling over bounded or manifold-constrained domains with natural noising processes (Beta, Dirichlet, vMF, Wishart, etc.), and exact reduction to Gaussian DDPM in standard case (Okhotin et al., 2023).

6. Applications and Quantitative Performance

DDPMs have demonstrated state-of-the-art results across diverse domains:

Wireless communications: Conditional DDPM achieves >10 dB enhancement in PSNR over DNN baselines under low SNR and hardware-impaired conditions, without further reduction of transmission rate (Letafati et al., 2023).
Medical imaging: RDDPM yields best-in-class blind quality metrics (BRISQUE 34.46 versus DDPM 46.71, lower is better) on sodium MRI, outperforming standard DDPM and deep CNN denoisers (Yuan et al., 2024).
Scientific simulation: DDPM and latent DDPM provide $t=1,\ldots,T$ 6 L1 error on thermal/flow fields, with LDM architectures achieving an order of magnitude reduction in computational cost (Jia et al., 11 Mar 2026).
Annotated microscopy data synthesis: DDPMs initialized from rough sketches generate training data for fully supervised segmentation models with no significant drop in performance compared to models trained on $t=1,\ldots,T$ 7 manually labeled examples (Eschweiler et al., 2023).
Speckle noise: SDDPM achieves PSNR up to 32.81 (ultrasound) versus prior best DnCNN 18.13, and operates in a single denoising step (Guha et al., 2023).

7. Theoretical and Practical Significance

The DDPM framework unifies variational latent variable modeling, denoising score matching, and stochastic (Langevin) sampling, yielding a modal interface for integrating physical priors, domain-specific noise models, and architectural advances. Theoretical work generalizes DDPMs to general submanifolds (Riemannian DDPMs) and arbitrary exponential families (Star-Shaped DDPMs), establishing that nearly all prior DDPMs are special cases of this broader denoising-variational machinery (Liu et al., 7 May 2025, Okhotin et al., 2023). Practically, DDPM-based models are robust, easily scalable, and empirically competitive with GANs for sample quality and distribution coverage. Fast-sampling variants and domain-adapted adaptations now enable deployment in bandwidth-limited, safety-critical, and high-dimensional simulation applications.