Diffusion-Based Generative Models (EDMs)

Updated 3 February 2026

Diffusion-based generative models (EDMs) are deep learning frameworks that map complex data distributions to noise and reverse the process to generate high-quality samples.
They employ a two-stage architecture with a fixed forward diffusion process and a neural network-driven reverse process, achieving robust synthesis in images, audio, graphs, and scientific data.
Recent advances include accelerated ODE solvers, non-asymptotic convergence guarantees, and energy-based parameterizations that improve sampling efficiency and theoretical rigor.

Diffusion-based generative models (EDMs) define a powerful class of deep generative models rooted in forward–reverse stochastic processes, where a complex data distribution is mapped to noise via a prescribed noising process (typically a Markov chain or SDE), and generation is achieved by numerically approximating the reverse of this process using neural-parameterized drift or score fields. EDMs form the backbone of state-of-the-art algorithms for image, audio, graph, and scientific-data synthesis, and have been theoretically unified via connections to variational inference, score-matching, and energy-based modeling. Recent developments have advanced both the algorithmic efficiency and the theoretical foundation of these models, including non-asymptotic error bounds, accelerated sampling, energy-function parameterizations, and rigorous convergence analysis.

1. Probabilistic and Mathematical Foundations

EDMs are characterized by a two-stage architecture: a (non-learned) forward "diffusion" process and a (learned) reverse "denoising" process. In discrete time, with observed data $x_0\sim p_{\text{data}}(x_0)$ and a fixed schedule $\{\beta_t\}_{t=1}^T$ , the forward process defines a Markov chain:

$q(x_t \mid x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right), \quad \alpha_t=1-\beta_t,\,\bar{\alpha}_t=\prod_{i=1}^{t}\alpha_i,$

so that $q(x_t|x_0) = \mathcal{N}\left(\sqrt{\bar \alpha_t}\, x_0, (1-\bar \alpha_t)I\right)$ (Diao et al., 25 Oct 2025, Li et al., 2023, Le, 2024).

The continuous-time limit leads to an Itô SDE, e.g., the variance-preserving (VP) formulation: $dx = -\tfrac{1}{2}\,\beta(t)\,x\,dt + \sqrt{\beta(t)}\,dW_t,$ with similar closed-form forward marginals (Yeğin et al., 2024, Diao et al., 25 Oct 2025).

Reverse-time generative modeling uses another SDE, obtained via Anderson's theorem, with drift involving the unknown score $\nabla_x\log p_t(x)$ (Huang et al., 2021, Yeğin et al., 2024): $dx = [f(x,t) - g(t)^2\,\nabla_x\log p_t(x)]dt + g(t)\,d\bar W_t,$ where the neural network $s_\theta(x,t)$ is trained to approximate the score (Le, 2024). The discrete reverse kernel, parameterized by a neural network as mean $\mu_\theta(x_t,t)$ and covariance $\Sigma_\theta(x_t,t)$ , takes the general form: $p_\theta(x_{t-1}|x_t) = \mathcal{N}\left(x_{t-1}; \mu_\theta(x_t,t), \Sigma_\theta(x_t,t)\right)$ or, in the widely used $\epsilon$ -parameterization, neural networks directly predict the noise in the reverse process (Diao et al., 25 Oct 2025).

Key training objectives include the variational lower bound (ELBO), exact and denoising score matching (DSM), and their continuous-time analogues (Huang et al., 2021, Yeğin et al., 2024, Zhang et al., 2024).

2. Training Objectives and Algorithmic Variants

The foundational objectives are:

Variational Lower Bound (ELBO) on the marginal likelihood of the data:

$\mathcal{L}_{\text{VLB}} = \mathbb{E}[\text{KL}(q(x_T|x_0)\| p(x_T)) + \cdots + \text{KL}(q(x_{t-1}|x_t,x_0)\|p_\theta(x_{t-1}|x_t)) + \cdots]$

Each KL term may be written in closed-form for fixed (Gaussian) kernels (Le, 2024).

Denoising Score Matching (DSM), fitting neural networks to conditional gradients of the noised data densities (Zhang et al., 2024):

$\ell(\theta;\sigma) = \mathbb{E}_{x_0,\tilde x\sim\mathcal{N}(x_0,\sigma^2I)}\left[\|\,s_\theta(\tilde x,\sigma) + \tfrac{\tilde x - x_0}{\sigma^2}\|_2^2\right].$

Weighted Fisher Score Matching / Continuous-time ELBO (for SDEs), connecting DSM to the negative log-density of the reverse-time process (Huang et al., 2021).

Algorithmic variants arise from changes to the objective or the generator–inference chain (Yeğin et al., 2024, Diao et al., 25 Oct 2025):

DDPM (Ho et al.): vanilla ELBO-based discrete Gaussian diffusion.
Improved DDPM: hybrid learned variance, cosine or other noise schedules.
Score-based models/NCSN: DSM over a grid of noise levels, trained with annealed Langevin dynamics.
EDM/Elucidated Diffusion Model (Karras et al.): continuous-time, noise-scale (σ) parameterization with high-order ODE samplers (Zhu et al., 2023).
Energy-Based Diffusion Models (EBDMs): parameterize the reverse process via a neural scalar energy $E_\theta(x,t)$ , enabling direct estimation of (unnormalized) log-priors and MH correction (Diao et al., 25 Oct 2025).

3. Sampling and Inference Mechanisms

Sampling proceeds by numerically integrating the learned reverse dynamics. The main algorithms include:

Ancestral Sampling: backward Markov chain starting from $x_T \sim \mathcal{N}(0,I)$ , with $x_{t-1}$ sampled from $p_\theta(x_{t-1}|x_t)$ .
Probability-Flow ODE Solvers: integrate deterministic ODEs corresponding to the SDE's marginal distributions (e.g., DDIM, DPM-Solver, exponential integrators), allowing for large step sizes and efficient sampling (Zhu et al., 2023, Pokle et al., 2022, Yeğin et al., 2024).
Langevin (Predictor–Corrector) Samplers: alternate SDE-based predictor updates and score-based corrector (Langevin) steps (Yeğin et al., 2024, Liu et al., 2023).
Metropolis–Hastings Corrected Diffusion: in energy-based frameworks, each reverse transition is subjected to an MH test for bias/fidelity improvement (Diao et al., 25 Oct 2025).
Deep Equilibrium Solvers: rephrase the entire DDIM chain as a single fixed-point system, allowing for parallel root-finding via Anderson acceleration, improving single-sample speed and inversion (Pokle et al., 2022).
Discrete/Non-Gaussian and Constrained-domain Diffusion: apply to categorical or structured data, e.g., via bridges or h-transforms (Liu et al., 2022, Liu et al., 2023).

Representative sampling pseudocode for the stochastic DDPM sampler (Li et al., 2023):

Sample Y_T ∼ N(0,I)
for t = T…1:
    Y_{t−1} ← (Y_t + (1−α_t)s_t(Y_t))/√α_t + σ_t * Z_t
    Z_t ∼ N(0,I)
return Y₀

4. Theoretical Properties and Convergence

Recent research has provided non-asymptotic, finite-sample convergence rates for discrete-time diffusion samplers (Li et al., 2023). For a deterministic ODE-based sampler with $T$ steps and access to accurate scores:

Probability-flow ODE sampler: Convergence in total-variation (TV) distance is $O(1/T)$ , improving to $O(1/T^2)$ under further acceleration (bias/variance corrections).
Stochastic DDPM sampler: TV (and KL) convergence rate is $O(1/\sqrt{T})$ , boosted to $O(1/T)$ with variance correctors.

The TV bounds scale polynomially in data dimension $d$ and depend linearly on the mean squared error between the learned and the true score. No global smoothness or log-Sobolev assumptions are required; only boundedness of the forward process is necessary (Li et al., 2023).

In energy-based diffusion models, explicit modeling of the log-prior energy enables true MH corrections, ensuring unbiased posterior sampling even under weak likelihood or strong prior regimes (Diao et al., 25 Oct 2025).

Variants based on bridge processes, e.g., for discrete or constrained domains, yield non-asymptotic KL divergence error bounds combining discretization and statistical estimation errors (Liu et al., 2022).

5. Practical Implementations and Applications

EDMs demonstrate broad versatility and have been instantiated in diverse domains:

Image, video, and audio synthesis: EDMs trained in the raw or spectrogram domain (e.g., EDMSound) achieve state-of-the-art fidelity with accelerated ODE sampling (e.g., DPM-Solver), as demonstrated by Fréchet Audio Distance (FAD) and FID benchmarking (Zhu et al., 2023).
Scientific and engineering inverse problems: Energy-based EDMs with MH correction provide robust posterior sampling for high-dimensional parameter estimation in MIMO channel estimation, outperforming conventional DMs and other baselines in normalized MSE, even under limited pilot overhead (Diao et al., 25 Oct 2025).
Structured data, graphs, and molecules: Graph-structured EDMs employ noise-perturbed adjacency matrices or node labels and learn permutation- and equivariant GNNs for score prediction, achieving SOTA in molecular conformation and design (Liu et al., 2023).
Materials science: Denoising diffusion models reconstruct complex microstructures with minimal hand-engineering, accurately matching real data in spatial statistics and grain-size distribution (Lee et al., 2022).
Speech enhancement: Unsupervised STFT-domain diffusion with EM posterior sampling yields competitive speech denoising, generalizing robustly to mismatched or unseen noise distributions (Nortier et al., 2023).

Modern architectures typically leverage U-Net (or GNN for graphs) backbones with step/noise-level conditioning, self-attention, and classifier-free guidance for conditional generation (Zhu et al., 2023, Yeğin et al., 2024). Convex optimization formulations for shallow network DSM objective yield exact solutions and non-asymptotic convergence in the case of two-layer ReLU networks (Zhang et al., 2024).

6. Generalizations, Limitations, and Future Directions

EDMs admit broad generalizations:

Energy-based parameterizations and compositional energy priors: Explicit modeling of energy enables simultaneous incorporation of multiple constraints via MH correction, with applications in compositional generation across vision, audio, and scientific computing (Diao et al., 25 Oct 2025).
Flexible SDE parameterizations: Learning the spatial part of the forward SDE (e.g., Riemannian metric, Hamiltonian twist) extends the family of EDMs beyond fixed VP/VE SDEs, leading to unified and potentially better-optimized models (Du et al., 2022).
Bridge processes and constrained-domain extensions: Conditioning diffusion processes on endpoint or constraint sets enables direct modeling of discrete, categorical, and manifold data (Liu et al., 2022, Liu et al., 2023).
Accelerated and equilibrium sampling: Parallelized fixed-point solvers, high-order ODE/implicit samplers, or knowledge-distilled few-step students have addressed sampling speed bottlenecks, particularly in real-time or embedded contexts (Pokle et al., 2022, Zhu et al., 2023).
Discrete data and non-Gaussian noise: Categorical or non-Gaussian corruption models extend EDMs to sequences, graphs, and structured data (Yeğin et al., 2024, Liu et al., 2023).

Limitations include high sampling cost (mitigated by accelerated samplers), lack of one-size-fits-all corruption kernels for general discrete data, difficulties in likelihood evaluation on structured data, and open questions on the tightness of theoretical bounds (especially in high dimensions). Evaluation metrics can inadvertently miss memorization or mode collapse in high-capacity models (Yeğin et al., 2024, Zhu et al., 2023).

Open theoretical directions focus on tight ELBO–NLL gaps, generalization in data-sparse regimes, and unification of statistical and algorithmic error analysis. Applications continue to broaden, including scientific inverse problems, classifier augmentation in imbalanced data, and compositional tasks requiring multi-energy integration (Diao et al., 25 Oct 2025, Le, 2024).

Summary Table: Key Algorithmic and Theoretical Properties

Aspect	EDMs (classic/energy-based)	Recent Theoretical Insights
Forward Mapping	Gaussian Markov chain/SDE	Abstract SDE, flexible metrics
Reverse Process	NN-parametrized score or mean	ODE/SDE, energy-function, bridge
Training Objective	ELBO, DSM, Fisher divergence	Convexification, finite-sample guarantees
Sampling Algorithm	Markov, ODE (DDIM/DPM-Solver), MH-corrected	Accelerated non-asymptotic TV bounds
Domain Adaptation	Images, audio, graphs, scientific	Discrete, constrained, compositional
Notable Results	SOTA generation, inverse sampling, microstructure reconstruction	$O(1/T)$ / $O(1/T^2)$ TV rates, EM-style bridges

For comprehensive reviews and technical depth, see (Diao et al., 25 Oct 2025, Li et al., 2023, Yeğin et al., 2024, Huang et al., 2021, Zhu et al., 2023), and (Le, 2024).

Markdown Upgrade to Chat

References (13)

Robust MIMO Channel Estimation Using Energy-Based Generative Diffusion Models (2025)

Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models (2023)

Generative Modeling with Diffusion (2024)

Theoretical research on generative diffusion models: an overview (2024)

A Variational Perspective on Diffusion-Based Generative Models and Score Matching (2021)

Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization (2024)

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis (2023)

Deep Equilibrium Approaches to Diffusion Models (2022)

Generative Diffusion Models on Graphs: Methods and Applications (2023)

10.

Let us Build Bridges: Understanding and Extending Diffusion Generative Models (2022)

11.

Microstructure reconstruction using diffusion-based generative models (2022)

12.

Unsupervised speech enhancement with diffusion-based generative models (2023)

13.

A Flexible Diffusion Model (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diffusion-Based Generative Models (EDMs).

Diffusion-Based Generative Models (EDMs)

1. Probabilistic and Mathematical Foundations

2. Training Objectives and Algorithmic Variants

3. Sampling and Inference Mechanisms

4. Theoretical Properties and Convergence

5. Practical Implementations and Applications

6. Generalizations, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Diffusion-Based Generative Models (EDMs)

1. Probabilistic and Mathematical Foundations

2. Training Objectives and Algorithmic Variants

3. Sampling and Inference Mechanisms

4. Theoretical Properties and Convergence

5. Practical Implementations and Applications

6. Generalizations, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research