Denoised Diffusion Probabilistic Models (DDPM)

Updated 25 February 2026

Denoised Diffusion Probabilistic Models (DDPM) are generative models that iteratively add noise to data and then learn a reverse process to accurately reconstruct it.
They employ a forward Markov chain for noise injection and a parameterized reverse chain optimized via mean-squared error losses, ensuring stability and high sample quality.
DDPMs have broad applications from image synthesis to medical imaging, though challenges remain in computational efficiency and adapting to non-Gaussian noise.

Denoised Diffusion Probabilistic Models (DDPMs) are a class of deep generative models constructed by composing a forward Markov chain that iteratively adds small amounts of noise to data samples and a learned reverse Markov chain that reconstructs data from pure noise. DDPMs have become foundational in generative modeling due to their stability, high sample quality, and tractable training via simple regression losses. They underpin modern state-of-the-art approaches across image, audio, tabular, and scientific data, and support rich variants for conditional, controlled, and efficient generative modeling (Ho et al., 2020, Gallon et al., 2024).

1. Mathematical Foundations

DDPMs are defined by two stochastic chains over data $x_0 \in \mathbb R^d$ :

Forward (noising) process:

Let $\{\beta_t\}_{t=1}^T \subset (0, 1)$ be a fixed variance (noise) schedule. For $t = 1, \dots, T$ ,

$q(x_t \mid x_{t-1}) = \mathcal N(x_t;\, \sqrt{1-\beta_t}\, x_{t-1},\; \beta_t I)$

By induction, the marginal can be written as

$q(x_t \mid x_0) = \mathcal N\left(x_t; \sqrt{\bar\alpha_t} x_0, (1-\bar\alpha_t) I\right), \quad \bar\alpha_t = \prod_{s=1}^t (1-\beta_s)$

As $t \to T$ , $q(x_T)$ approaches an isotropic Gaussian.

Reverse (denoising) process:

A learnable Markov chain is constructed with

$p_\theta(x_{t-1} \mid x_t) = \mathcal N\left(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t)\right)$

A common parameterization fixes $\Sigma_\theta(x_t, t) = \beta_t I$ and sets

$\mu_\theta(x_t, t) = \frac{1}{\sqrt{1 - \beta_t}}\left(x_t - \frac{\beta_t}{\sqrt{1 - \bar\alpha_t}}\, \epsilon_\theta(x_t, t)\right)$

where $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 0 is a neural network trained to predict the Gaussian noise added in the forward process (Ho et al., 2020, Gallon et al., 2024).

2. Training Objectives and Loss Functions

DDPMs are fit to maximize the variational lower bound (ELBO) on the log-likelihood. For Gaussian chains, after routine algebra, the loss decomposes into per-timestep Kullback-Leibler divergences and an “output decoder” term: $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 1 where

$\{\beta_t\}_{t=1}^T \subset (0, 1)$ 2

In practice, this reduces (up to weighting) to a mean-squared-error objective: $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 3 where $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 4, and $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 5 (Ho et al., 2020, Gallon et al., 2024, Turner et al., 2024). This objective is equivalent, up to constants, to denoising score-matching.

3. Sampling Procedures and Algorithmic Implementation

Sampling from a trained DDPM proceeds via the reverse Markov chain, initialized from $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 6:

For $\{\beta_t\}_{t=1}^T \subset (0, 1)$ ${β_{t}}_{t = 1}^{T} \subset (0, 1)$ 7:
- Compute $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 8
- Compute mean $\{\beta_t\}_{t=1}^T \subset (0, 1)$ 9 as above
- Sample $t = 1, \dots, T$ 0
Return $t = 1, \dots, T$ 1

Key architectural details include the use of U-Net backbones with time-step embeddings, group normalization or AdaGroupNorm, and, for conditional tasks, additional conditioning modalities (Gallon et al., 2024, Xu et al., 2023, Osuna-Vargas et al., 2024). Both linear and cosine schedules for $t = 1, \dots, T$ 2 are supported; cosine schedules improve gradient stability in practice.

4. Extensions, Conditioning, and Model Variants

DDPMs support substantial algorithmic and architectural extensions:

Conditional DDPMs: Conditions are incorporated via concatenation, affine transformation (e.g., FiLM), or transformer-based embeddings, enabling tasks such as image-to-image mapping, data imputation, tabular modeling, and more (Whitbread et al., 24 Jan 2026, Letafati et al., 2023, Osuna-Vargas et al., 2024).
Classifier-Free Guidance: Models are trained with and without conditioning, enabling adjustable tradeoff between conditionality and diversity via a guidance scale at inference (Gallon et al., 2024).
Denoising Diffusion Implicit Models (DDIMs): Deterministic analogues of DDPMs which allow fewer sampling steps by forgoing injected noise in the reverse process (Gallon et al., 2024).
Latent Diffusion Models: The diffusion process operates in a learned latent space, which greatly reduces computational demand (Gallon et al., 2024).
URDP/UDPM/Star-Shaped Diffusions: Models that generalize diffusion to upsampling super-resolution and non-Gaussian forward processes, such as Beta or von Mises-Fisher, via “star-shaped” diffusions (Abu-Hussein et al., 2023, Okhotin et al., 2023).

5. Domain-Specific Adaptations: Rician and Non-Gaussian Noise

Standard DDPMs are formulated for additive Gaussian noise, but real-world modalities often exhibit non-Gaussian and biased noise characteristics:

Rician Denoising Diffusion Probabilistic Model (RDDPM): In magnitude MR imaging (notably sodium MRI), observed images are corrupted by Rician noise, which is not zero-mean and introduces quantifiable bias, especially at low SNR. Vanilla DDPMs trained on Gaussian assumptions systematically oversmooth and fail to correct this bias. RDDPM resolves this by converting magnitude data at each timestep from Rician to pseudo-Gaussian, applying a neural network to invert the magnitude bias, and then proceeding with standard Gaussian-denoising steps. RDDPM is shown to outperform vanilla DDPMs and CNN-based models in terms of BRISQUE (34.46, lowest), MUSIQ (2.79, lowest), and PaQ2PiQ (4.38, second-best) metrics, while preserving high-SNR structures (Yuan et al., 2024).
Star-Shaped and Exponential Family Diffusions: The star-shaped DDPM generalizes the forward process to conditionally independent steps within an exponential family, enabling tractable modeling of data on constrained manifolds (Beta, vMF, Dirichlet, etc.) rather than Euclidean space (Okhotin et al., 2023).

6. Practical Applications Across Scientific and Technical Domains

DDPMs have been empirically validated and applied in diverse settings:

Image Synthesis: State-of-the-art results on CIFAR-10 (FID = 3.17, Inception Score = 9.46) and LSUN (bedroom, church, cat subsets) (Ho et al., 2020).
Medical Imaging and Denoising: Enhanced image restoration under Rician noise in sodium MRI (Yuan et al., 2024), denoising for high-resolution microscopy (Osuna-Vargas et al., 2024), and robust spatial context modeling for domain-specific “implicit” statistics in medical images, outperforming GANs for contextual fidelity (Deshpande et al., 2023).
Wireless Communication: Conditional DDPMs provide over 10 dB improvement in low-SNR wireless image transmission, without compromising data rate (Letafati et al., 2023).
Astrophysics: Volumetric density estimation for molecular clouds, with order-of-magnitude error reduction over classical and deep CNN-based methods (Xu et al., 2023).
Adversarial Defense: DDPMs serve as purification front ends to remove adversarial perturbations, restoring up to 88% of original model accuracy under strong attacks (Ankile et al., 2023).
Tabular and Normative Modeling: DDPMs jointly estimate multivariate densities of high-dimensional tabular data for reference interval construction in neuroimaging; transformer-based backbones (SAINT) preserve higher-order dependence structure beyond traditional per-variable models (Whitbread et al., 24 Jan 2026).

7. Theoretical Properties and Convergence

Theoretical work has established the connection between DDPMs and score-based generative modeling (Langevin dynamics, denoising score-matching), demonstrating that the stochastic reverse chain approximates the data distribution by iteratively applying denoising steps (Ho et al., 2020). Recent results prove that, for data with intrinsic dimension $t = 1, \dots, T$ 3, the number of diffusion steps required for accurate sample generation scales nearly linearly in $t = 1, \dots, T$ 4, not $t = 1, \dots, T$ 5, providing a theoretical explanation for empirical efficiency on high-dimensional data concentrated on low-dimensional manifolds (Huang et al., 2024).

Domain	Application	Core DDPM Adaptation/Extension
MRI denoising	Rician noise, sodium MRI	RDDPM (Rician→Gaussian conversion)
Microscopy	Image restoration	Conditional DDPM (low-res→high-res)
Wireless comms	Robust image reconstruction	Conditional DDPM (side-informed denoiser)
Astrophysics	Molecular cloud density estimation	Conditional U-Net, MSE training
Tabular biomedicine	Normative modeling, covariate calibration	FiLM/SAINT transformer denoisers

8. Limitations and Open Challenges

While DDPMs function robustly for a variety of generative tasks, domain-specific adaptations are crucial for non-Gaussian noise or constrained data spaces (e.g., Rician noise, manifold-valued data). The computational burden of many inference steps in high-fidelity generation has motivated research into deterministically sampled (DDIM) or upsampling models (UDPM) for faster and more controllable generation (Abu-Hussein et al., 2023). In the multi-task and limited data regime, performance depends critically on architectural choices for shared and exclusive representations; negative transfer may occur for heterogeneous tasks (Pirhayatifard et al., 2023).

Limitations specific to RDDPM include dependence on a pre-trained Gaussian DDPM, inexact inversion of the Rician magnitude, and computational overhead from dual noise sampling per step (Yuan et al., 2024). In generic DDPMs, challenges persist in modeling global, joint constraints (e.g., anatomical or semantic context), high-dimensional tabular dependencies, and efficient, scalable sampling.

References:

"Denoising Diffusion Probabilistic Models" (Ho et al., 2020)
"An overview of diffusion models for generative artificial intelligence" (Gallon et al., 2024)
"Rician Denoising Diffusion Probabilistic Models For Sodium Breast MRI Enhancement" (Yuan et al., 2024)
"Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality" (Huang et al., 2024)
"Denoising diffusion networks for normative modeling in neuroimaging" (Whitbread et al., 24 Jan 2026)
"Denoising diffusion models for high-resolution microscopy image restoration" (Osuna-Vargas et al., 2024)
"Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications" (Letafati et al., 2023)
"Denoising Diffusion Probabilistic Models to Predict the Density of Molecular Clouds" (Xu et al., 2023)
"Star-Shaped Denoising Diffusion Probabilistic Models" (Okhotin et al., 2023)
"Improving Denoising Diffusion Probabilistic Models via Exploiting Shared Representations" (Pirhayatifard et al., 2023)
"Denoising Diffusion Probabilistic Models as a Defense against Adversarial Attacks" (Ankile et al., 2023)
"Assessing the capacity of a denoising diffusion probabilistic model to reproduce spatial context" (Deshpande et al., 2023)
"Denoising Diffusion Probabilistic Models in Six Simple Steps" (Turner et al., 2024)