Papers
Topics
Authors
Recent
Search
2000 character limit reached

Diffusion-Based Generative Models (EDMs)

Updated 3 February 2026
  • Diffusion-based generative models (EDMs) are deep learning frameworks that map complex data distributions to noise and reverse the process to generate high-quality samples.
  • They employ a two-stage architecture with a fixed forward diffusion process and a neural network-driven reverse process, achieving robust synthesis in images, audio, graphs, and scientific data.
  • Recent advances include accelerated ODE solvers, non-asymptotic convergence guarantees, and energy-based parameterizations that improve sampling efficiency and theoretical rigor.

Diffusion-based generative models (EDMs) define a powerful class of deep generative models rooted in forward–reverse stochastic processes, where a complex data distribution is mapped to noise via a prescribed noising process (typically a Markov chain or SDE), and generation is achieved by numerically approximating the reverse of this process using neural-parameterized drift or score fields. EDMs form the backbone of state-of-the-art algorithms for image, audio, graph, and scientific-data synthesis, and have been theoretically unified via connections to variational inference, score-matching, and energy-based modeling. Recent developments have advanced both the algorithmic efficiency and the theoretical foundation of these models, including non-asymptotic error bounds, accelerated sampling, energy-function parameterizations, and rigorous convergence analysis.

1. Probabilistic and Mathematical Foundations

EDMs are characterized by a two-stage architecture: a (non-learned) forward "diffusion" process and a (learned) reverse "denoising" process. In discrete time, with observed data x0pdata(x0)x_0\sim p_{\text{data}}(x_0) and a fixed schedule {βt}t=1T\{\beta_t\}_{t=1}^T, the forward process defines a Markov chain:

q(xtxt1)=N(xt;1βtxt1,βtI),αt=1βt,αˉt=i=1tαi,q(x_t \mid x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right), \quad \alpha_t=1-\beta_t,\,\bar{\alpha}_t=\prod_{i=1}^{t}\alpha_i,

so that q(xtx0)=N(αˉtx0,(1αˉt)I)q(x_t|x_0) = \mathcal{N}\left(\sqrt{\bar \alpha_t}\, x_0, (1-\bar \alpha_t)I\right) (Diao et al., 25 Oct 2025, Li et al., 2023, Le, 2024).

The continuous-time limit leads to an Itô SDE, e.g., the variance-preserving (VP) formulation: dx=12β(t)xdt+β(t)dWt,dx = -\tfrac{1}{2}\,\beta(t)\,x\,dt + \sqrt{\beta(t)}\,dW_t, with similar closed-form forward marginals (Yeğin et al., 2024, Diao et al., 25 Oct 2025).

Reverse-time generative modeling uses another SDE, obtained via Anderson's theorem, with drift involving the unknown score xlogpt(x)\nabla_x\log p_t(x) (Huang et al., 2021, Yeğin et al., 2024): dx=[f(x,t)g(t)2xlogpt(x)]dt+g(t)dWˉt,dx = [f(x,t) - g(t)^2\,\nabla_x\log p_t(x)]dt + g(t)\,d\bar W_t, where the neural network sθ(x,t)s_\theta(x,t) is trained to approximate the score (Le, 2024). The discrete reverse kernel, parameterized by a neural network as mean μθ(xt,t)\mu_\theta(x_t,t) and covariance Σθ(xt,t)\Sigma_\theta(x_t,t), takes the general form: pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1}|x_t) = \mathcal{N}\left(x_{t-1}; \mu_\theta(x_t,t), \Sigma_\theta(x_t,t)\right) or, in the widely used ϵ\epsilon-parameterization, neural networks directly predict the noise in the reverse process (Diao et al., 25 Oct 2025).

Key training objectives include the variational lower bound (ELBO), exact and denoising score matching (DSM), and their continuous-time analogues (Huang et al., 2021, Yeğin et al., 2024, Zhang et al., 2024).

2. Training Objectives and Algorithmic Variants

The foundational objectives are:

  • Variational Lower Bound (ELBO) on the marginal likelihood of the data:

LVLB=E[KL(q(xTx0)p(xT))++KL(q(xt1xt,x0)pθ(xt1xt))+]\mathcal{L}_{\text{VLB}} = \mathbb{E}[\text{KL}(q(x_T|x_0)\| p(x_T)) + \cdots + \text{KL}(q(x_{t-1}|x_t,x_0)\|p_\theta(x_{t-1}|x_t)) + \cdots]

Each KL term may be written in closed-form for fixed (Gaussian) kernels (Le, 2024).

(θ;σ)=Ex0,x~N(x0,σ2I)[sθ(x~,σ)+x~x0σ222].\ell(\theta;\sigma) = \mathbb{E}_{x_0,\tilde x\sim\mathcal{N}(x_0,\sigma^2I)}\left[\|\,s_\theta(\tilde x,\sigma) + \tfrac{\tilde x - x_0}{\sigma^2}\|_2^2\right].

  • Weighted Fisher Score Matching / Continuous-time ELBO (for SDEs), connecting DSM to the negative log-density of the reverse-time process (Huang et al., 2021).

Algorithmic variants arise from changes to the objective or the generator–inference chain (Yeğin et al., 2024, Diao et al., 25 Oct 2025):

  • DDPM (Ho et al.): vanilla ELBO-based discrete Gaussian diffusion.
  • Improved DDPM: hybrid learned variance, cosine or other noise schedules.
  • Score-based models/NCSN: DSM over a grid of noise levels, trained with annealed Langevin dynamics.
  • EDM/Elucidated Diffusion Model (Karras et al.): continuous-time, noise-scale (σ) parameterization with high-order ODE samplers (Zhu et al., 2023).
  • Energy-Based Diffusion Models (EBDMs): parameterize the reverse process via a neural scalar energy Eθ(x,t)E_\theta(x,t), enabling direct estimation of (unnormalized) log-priors and MH correction (Diao et al., 25 Oct 2025).

3. Sampling and Inference Mechanisms

Sampling proceeds by numerically integrating the learned reverse dynamics. The main algorithms include:

  • Ancestral Sampling: backward Markov chain starting from xTN(0,I)x_T \sim \mathcal{N}(0,I), with xt1x_{t-1} sampled from pθ(xt1xt)p_\theta(x_{t-1}|x_t).
  • Probability-Flow ODE Solvers: integrate deterministic ODEs corresponding to the SDE's marginal distributions (e.g., DDIM, DPM-Solver, exponential integrators), allowing for large step sizes and efficient sampling (Zhu et al., 2023, Pokle et al., 2022, Yeğin et al., 2024).
  • Langevin (Predictor–Corrector) Samplers: alternate SDE-based predictor updates and score-based corrector (Langevin) steps (Yeğin et al., 2024, Liu et al., 2023).
  • Metropolis–Hastings Corrected Diffusion: in energy-based frameworks, each reverse transition is subjected to an MH test for bias/fidelity improvement (Diao et al., 25 Oct 2025).
  • Deep Equilibrium Solvers: rephrase the entire DDIM chain as a single fixed-point system, allowing for parallel root-finding via Anderson acceleration, improving single-sample speed and inversion (Pokle et al., 2022).
  • Discrete/Non-Gaussian and Constrained-domain Diffusion: apply to categorical or structured data, e.g., via bridges or h-transforms (Liu et al., 2022, Liu et al., 2023).

Representative sampling pseudocode for the stochastic DDPM sampler (Li et al., 2023):

1
2
3
4
5
Sample Y_T  N(0,I)
for t = T1:
    Y_{t1}  (Y_t + (1α_t)s_t(Y_t))/α_t + σ_t * Z_t
    Z_t  N(0,I)
return Y

4. Theoretical Properties and Convergence

Recent research has provided non-asymptotic, finite-sample convergence rates for discrete-time diffusion samplers (Li et al., 2023). For a deterministic ODE-based sampler with TT steps and access to accurate scores:

  • Probability-flow ODE sampler: Convergence in total-variation (TV) distance is O(1/T)O(1/T), improving to O(1/T2)O(1/T^2) under further acceleration (bias/variance corrections).
  • Stochastic DDPM sampler: TV (and KL) convergence rate is O(1/T)O(1/\sqrt{T}), boosted to O(1/T)O(1/T) with variance correctors.

The TV bounds scale polynomially in data dimension dd and depend linearly on the mean squared error between the learned and the true score. No global smoothness or log-Sobolev assumptions are required; only boundedness of the forward process is necessary (Li et al., 2023).

In energy-based diffusion models, explicit modeling of the log-prior energy enables true MH corrections, ensuring unbiased posterior sampling even under weak likelihood or strong prior regimes (Diao et al., 25 Oct 2025).

Variants based on bridge processes, e.g., for discrete or constrained domains, yield non-asymptotic KL divergence error bounds combining discretization and statistical estimation errors (Liu et al., 2022).

5. Practical Implementations and Applications

EDMs demonstrate broad versatility and have been instantiated in diverse domains:

  • Image, video, and audio synthesis: EDMs trained in the raw or spectrogram domain (e.g., EDMSound) achieve state-of-the-art fidelity with accelerated ODE sampling (e.g., DPM-Solver), as demonstrated by Fréchet Audio Distance (FAD) and FID benchmarking (Zhu et al., 2023).
  • Scientific and engineering inverse problems: Energy-based EDMs with MH correction provide robust posterior sampling for high-dimensional parameter estimation in MIMO channel estimation, outperforming conventional DMs and other baselines in normalized MSE, even under limited pilot overhead (Diao et al., 25 Oct 2025).
  • Structured data, graphs, and molecules: Graph-structured EDMs employ noise-perturbed adjacency matrices or node labels and learn permutation- and equivariant GNNs for score prediction, achieving SOTA in molecular conformation and design (Liu et al., 2023).
  • Materials science: Denoising diffusion models reconstruct complex microstructures with minimal hand-engineering, accurately matching real data in spatial statistics and grain-size distribution (Lee et al., 2022).
  • Speech enhancement: Unsupervised STFT-domain diffusion with EM posterior sampling yields competitive speech denoising, generalizing robustly to mismatched or unseen noise distributions (Nortier et al., 2023).

Modern architectures typically leverage U-Net (or GNN for graphs) backbones with step/noise-level conditioning, self-attention, and classifier-free guidance for conditional generation (Zhu et al., 2023, Yeğin et al., 2024). Convex optimization formulations for shallow network DSM objective yield exact solutions and non-asymptotic convergence in the case of two-layer ReLU networks (Zhang et al., 2024).

6. Generalizations, Limitations, and Future Directions

EDMs admit broad generalizations:

  • Energy-based parameterizations and compositional energy priors: Explicit modeling of energy enables simultaneous incorporation of multiple constraints via MH correction, with applications in compositional generation across vision, audio, and scientific computing (Diao et al., 25 Oct 2025).
  • Flexible SDE parameterizations: Learning the spatial part of the forward SDE (e.g., Riemannian metric, Hamiltonian twist) extends the family of EDMs beyond fixed VP/VE SDEs, leading to unified and potentially better-optimized models (Du et al., 2022).
  • Bridge processes and constrained-domain extensions: Conditioning diffusion processes on endpoint or constraint sets enables direct modeling of discrete, categorical, and manifold data (Liu et al., 2022, Liu et al., 2023).
  • Accelerated and equilibrium sampling: Parallelized fixed-point solvers, high-order ODE/implicit samplers, or knowledge-distilled few-step students have addressed sampling speed bottlenecks, particularly in real-time or embedded contexts (Pokle et al., 2022, Zhu et al., 2023).
  • Discrete data and non-Gaussian noise: Categorical or non-Gaussian corruption models extend EDMs to sequences, graphs, and structured data (Yeğin et al., 2024, Liu et al., 2023).

Limitations include high sampling cost (mitigated by accelerated samplers), lack of one-size-fits-all corruption kernels for general discrete data, difficulties in likelihood evaluation on structured data, and open questions on the tightness of theoretical bounds (especially in high dimensions). Evaluation metrics can inadvertently miss memorization or mode collapse in high-capacity models (Yeğin et al., 2024, Zhu et al., 2023).

Open theoretical directions focus on tight ELBO–NLL gaps, generalization in data-sparse regimes, and unification of statistical and algorithmic error analysis. Applications continue to broaden, including scientific inverse problems, classifier augmentation in imbalanced data, and compositional tasks requiring multi-energy integration (Diao et al., 25 Oct 2025, Le, 2024).


Summary Table: Key Algorithmic and Theoretical Properties

Aspect EDMs (classic/energy-based) Recent Theoretical Insights
Forward Mapping Gaussian Markov chain/SDE Abstract SDE, flexible metrics
Reverse Process NN-parametrized score or mean ODE/SDE, energy-function, bridge
Training Objective ELBO, DSM, Fisher divergence Convexification, finite-sample guarantees
Sampling Algorithm Markov, ODE (DDIM/DPM-Solver), MH-corrected Accelerated non-asymptotic TV bounds
Domain Adaptation Images, audio, graphs, scientific Discrete, constrained, compositional
Notable Results SOTA generation, inverse sampling, microstructure reconstruction O(1/T)O(1/T)/O(1/T2)O(1/T^2) TV rates, EM-style bridges

For comprehensive reviews and technical depth, see (Diao et al., 25 Oct 2025, Li et al., 2023, Yeğin et al., 2024, Huang et al., 2021, Zhu et al., 2023), and (Le, 2024).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Diffusion-Based Generative Models (EDMs).