Denoising Diffusion Probabilistic Models (DDPMs)
Denoising diffusion probabilistic models (DDPMs) are a class of likelihood-based deep generative models that synthesize complex data distributions by gradually transforming simple noise into samples that resemble real data. These models operate by reversing a defined forward diffusion process that incrementally adds noise to the data, and they have demonstrated competitive or state-of-the-art performance in image synthesis and other generative tasks. The training procedure is rooted in variational inference and is deeply connected to concepts from denoising score matching and Langevin dynamics, situating DDPMs at the intersection of probabilistic modeling and nonequilibrium thermodynamics.
1. Principles of Denoising Diffusion Probabilistic Models
DDPMs are formulated as latent variable models that define a Markov chain to transition from data samples to a noise prior distribution (typically Gaussian) over steps and then learn to reverse this noising process. The core components are:
- Forward Process (): A Markov chain with fixed, time-dependent Gaussian transitions adds noise to each data point in steps:
where is a predefined variance schedule.
- Reverse Process (): A parameterized Markov chain (typically a neural network) is trained to model the reverse transitions:
with .
- DDPMs reconstruct samples by iteratively denoising from random Gaussian noise—that is, by sampling and recursively applying for .
2. Variational Objective and Connection to Score Matching
The central training objective is a variational lower bound (ELBO) on the data log-likelihood. This objective decomposes as: This can be rewritten using closed-form KL divergences, thanks to the Gaussian assumptions.
A key insight is that the loss term for the reverse process's mean can be written as a weighted mean squared error between the actual noise and the model's noise prediction: where , and is the neural predictor. This is identical in form to the objective for denoising score matching and closely related to annealed Langevin dynamics. The equivalence clarifies that training a DDPM models the score function of the perturbed data distribution across noise scales, and Langevin-style iterative denoising at inference corresponds to the path of maximizing data likelihood.
Empirically, an unweighted version of this loss,
focuses training on the more challenging, highly corrupted samples and leads to improved perceptual sample quality.
3. Progressive Decoding and Interpretable Compression
DDPMs admit an interpretation as progressive lossy decompressors. The forward process encodes information about across a sequence, and progressively sending allows progressive reconstruction: Unlike classical autoregressive decoders (which follow hard coordinate-wise orderings), the DDPM's trajectory is continuous in noise space, yielding a generalization of sequential coding. This interpretation enables DDPMs to be analyzed using information-theoretic rate-distortion theory, showing that most coding effort describes small perceptual details.
4. Empirical Performance and Benchmark Comparisons
The original DDPM framework achieves state-of-the-art results on image generation benchmarks:
- Unconditional CIFAR10: Inception Score , FID $3.17$, Bits/Dim , competitive with GAN and autoregressive models.
- LSUN 256x256: Sample quality matches or exceeds ProgressiveGAN, though StyleGAN remains superior by FID on some splits.
In summary tables:
Model | IS | FID |
---|---|---|
NCSN (Score Matching) | 8.87 | 25.32 |
SNGAN-DDLS (GAN) | 9.09 | 15.42 |
StyleGAN2+ADA (cond.) | 9.74 | 3.26 |
Diffusion (ours, ) | 9.46 | 3.17 |
DDPMs close the gap with GANs for perceptual metrics, sometimes outperforming class-conditional models despite their likelihood-based foundation and more stable training.
5. Theoretical Unification and Future Implications
Denoising diffusion probabilistic models reveal a unification across several areas of generative modeling:
- Connection to latent variable models: The forward/reverse process is a latent variable model with explicit evidence lower bound.
- Connection to score matching/Langevin dynamics: Training aligns with denoising score matching objectives and sampling implements annealed Langevin dynamics.
- Autoregressive-like progressive decoding: Sampling can be seen as generalized lossy decompression.
This unified view enables transfer of architectural, algorithmic, and analytical advances between generative modeling paradigms—score-based models, VAEs, and energy-based models.
Future research directions suggested include leveraging more expressive decoders, integrating hybrid energy/autoregressive modules, efficient sampling approaches, and extending the lossy decompression view to modalities beyond images.
6. Summary of Main Mathematical Expressions
- Forward process:
- Reverse transition update:
- Weighted variational objective:
- Simplified mean squared error loss:
7. Impact and Ongoing Developments
DDPMs establish a scalable, robust, and theoretically grounded framework for deep generative modeling—demonstrating that likelihood-based models can achieve or surpass GAN-level sample quality with improved mode coverage and training stability. Their foundational ties to score-based methods and variational inference clarify the algorithm’s operating principles and suggest broad potential for interpretability, compression, and scalable generation across data modalities. Ongoing research continues to explore higher expressiveness in decoders, more efficient sampling, improved rates of lossy decompression, and new applications to domains beyond images, leveraging the architecture’s flexibility and principled mathematical structure.