Score-Based Diffusion Models

Updated 3 June 2026

Score-Based Diffusion Models are generative models that learn the gradient of the log-density of progressively noised data, enabling effective reverse diffusion sampling.
They extend methods like DDPMs and annealed Langevin dynamics to achieve state-of-the-art performance in image, audio, inverse imaging, and molecular sampling.
The framework leverages SDE/ODE formulations, score matching, and advanced calculus techniques to provide robust theoretical guarantees and scalable algorithms.

Score-Based Diffusion Models (SBDMs) are a framework for generative modeling in which one learns the score—the gradient of the log-density—of progressively noised versions of a data distribution, and then synthesizes new samples by approximately reversing this diffusion process via stochastic (SDE) or deterministic (ODE) dynamics. This approach encompasses and extends denoising diffusion probabilistic models (DDPMs), annealed Langevin dynamics, and recent diffusion normalizing flows. SBDMs have recently achieved state-of-the-art performance across a wide range of tasks, including image, audio, conditional generation, inverse imaging, Bayesian inference, molecular sampling, and high-dimensional function-space modeling (Tang et al., 2024, Song et al., 2021, Mirafzali et al., 21 Mar 2025, Lim et al., 2023, Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023).

1. Mathematical Formulation and Key Components

The SBDM framework is characterized by three central components:

Forward Diffusion SDE A (possibly infinite-dimensional) stochastic differential equation (SDE) that gradually transforms a data sample $x_0\sim p_{\text{data}}$ into pure noise:

$d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$

where $f$ is a drift (often mean-reverting towards 0), $g$ is a diffusion coefficient, and $w_t$ is standard Brownian motion. For function-valued data or solutions to PDEs, the process can be formulated as a linear SPDE on a Hilbert space, e.g., $du(t) = A u(t)\,dt + Q^{1/2}\,dW_t$ with operator-theoretic diffusion (Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023).

Score Function and Score Matching For each $t$ , the score $\nabla_x \log p_t(x)$ of the noise-perturbed density $p_t(x)$ is approximated by a neural network (or operator network in infinite dimensions) $s_\theta(x,t)$ , trained to minimize the expected squared difference between $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 0 and the true score. DSM (denoising score matching) leverages access to the conditional transition $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 1, often Gaussian, yielding a tractable, closed-form training loss (Tang et al., 2024).
Reverse-Time SDE (and Probability-Flow ODE) The time-reversal of the forward SDE induces a drift involving the (unknown) score:

$d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 2

The learned score $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 3 is substituted for $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 4 for sampling. The probability-flow ODE replaces the SDE by a deterministic flow:

$d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 5

Exactly solving the ODE enables likelihood evaluation and deterministic mapping between latent and data spaces (Tang et al., 2024, Song et al., 2021).

2. Score Matching, Likelihood Training, and Sampling

The canonical training procedure minimizes a weighted Fisher divergence (score-matching loss), whose minimum coincides with the true score function: $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 6 Denoising score matching (DSM) replaces the inaccessible $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 7 with $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 8: $d x_t = f(x_t, t)\, dt + g(t)\, d w_t\,, \quad x_0 \sim p_{\text{data}}$ 9 For Gaussian forward kernels, this reduces to a simple regression against known $f$ 0-dependent targets.

Likelihood-based training is possible via the continuous normalizing flow view, using the probability-flow ODE and instantaneous change-of-variables. Choosing likelihood weighting $f$ 1 ensures the SDE-based score-matching objective upper bounds the negative log-likelihood (NLL), yielding high-quality density estimators matching autoregressive models (Song et al., 2021).

Sampling employs discretized SDE or ODE solvers (Euler–Maruyama, predictor–corrector, Runge–Kutta). ODE-based sampling, or one-shot “consistency models”, can produce competitive sample quality in fewer steps, but may involve more complex training (Tang et al., 2024, Na et al., 2024).

3. Extensions: Function Space, Infinite Dimensions, and Operator-Valued Models

Recent work has rigorously extended SBDMs to function spaces and infinite-dimensional Hilbert spaces—essential for scientific computing, inverse problems, and modeling of PDE solutions (Lim et al., 2023, Hagemann et al., 2023, Mirafzali et al., 27 Aug 2025, Baker et al., 28 Jan 2026). Key advances include:

Infinite-Dimensional Forward Diffusion For $f$ 2 in Hilbert space $f$ 3, define an SPDE $f$ 4 with $f$ 5 trace-class, preserving spatially correlated (colored) noise and well-posedness in arbitrary dimensions (Mirafzali et al., 27 Aug 2025).
Closed-Form Infinite-Dimensional Score Via infinite-dimensional Malliavin calculus and Bismut–Elworthy–Li formulas, exact expressions for the Frechet derivative of the log-density are obtained, avoiding finite-dimensional projections:

$f$ 6

with explicit formulas for the Malliavin covariance $f$ 7 (Mirafzali et al., 27 Aug 2025, Mirafzali et al., 21 Mar 2025).

Operator-Valued Networks and Multilevel Training Approximation of the infinite-dimensional score is accomplished via Fourier Neural Operators or multilevel U-Net-style operator networks, structured for mesh-independent generalization (Hagemann et al., 2023). A telescopic training loss ensures convergence and adapts across spatial resolutions.
Posterior Conditioning and Guidance In Bayesian inverse problems, infinite-dimensional h-transform extensions (Doob's h-transform) enable conditioning SBDMs on observations. The conditional score decomposes as $f$ 8, and simulation-free supervised guidance training recovers the guidance term for posterior sampling (Baker et al., 28 Jan 2026).

4. Practical Algorithms, Theoretical Guarantees, and Variants

SBDMs admit a spectrum of algorithmic and theoretical refinements:

Score Decomposition and Manifold Optimization Recent models decompose the score into normal (denoising) and tangent (content refinement) directions on reference manifolds, facilitating Pareto-efficient, multi-objective image-to-image translation (Sun et al., 2023).
Flexible Forward SDEs Beyond fixed SDEs, the forward process can be parameterized by a position-dependent Riemannian metric and Hamiltonian/symplectic drift, guaranteeing normalizable stationary laws and allowing for data-adaptive geometries (Du et al., 2022).
Dimension-Free Sample Complexity and Variance Reduction It is possible to learn a single score network across timesteps with nearly dimension-free generalization, proven via martingale error decompositions and variance-minimizing bootstrapped targets (Kumar et al., 14 Feb 2025). When the data lie near a $f$ 9-dimensional manifold in $g$ 0, careful scheduling and coefficient design enable discretization error bounds scaling with $g$ 1 rather than $g$ 2 (Li et al., 2024).
Malliavin Calculus for Score Computation: Analytical score formulas via Malliavin calculus coincide with the Fokker–Planck solution for linear SDEs and generalize to nonlinear, state-independent cases, lowering estimator variance in highly-nonlinear/multimodal settings (Mirafzali et al., 21 Mar 2025).
Reward-Directed and RL-Tuned Diffusion Treating score selection as a control policy allows reinforcement learning-based fine-tuning for reward maximization under entropy regularization. The optimal stochastic policy is always Gaussian, with closed-form mean and covariance, and practical estimation is achieved via actor-critic q-learning (Gao et al., 2024, Tang et al., 2024).
Posterior Inference and Inverse Problems SBDMs serve as powerful priors for Bayesian image reconstruction and general inverse problems. Inference combines SDE sampling with measurement-gradient conditioning, variational flows (DPI), or projection steps for data consistency (McCann et al., 2023, Feng et al., 2023, Chung et al., 2021).

5. Sampling Efficiency, Evaluation, and Empirical Results

Efficiency and effectiveness of SBDMs are advanced by several techniques:

Score Embedding and PDE-Based Pre-computation Solving the log-density Fokker–Planck equation numerically in advance and embedding the computed score into training accelerates convergence—reducing the number of epochs and data required for high-fidelity denoising (Na et al., 2024).
Importance Sampling for Boltzmann Distributions Post-training methods such as Variance-Tuned Diffusion Importance Sampling (VT-DIS) overcome bias in learned samplers via trajectory-wise reweighting, yielding unbiased estimates with high effective sample size at negligible test-time overhead (Zhang et al., 27 May 2025).
Ensemble Score Filters for SPDEs In data assimilation for SPDEs, ensemble-based score filters offer real-time, training-free posterior inference, competitive with (or exceeding) particle and Kalman-type filters under sparse and noisy observations (Huynh et al., 9 Aug 2025).

Benchmarks consistently indicate SBDMs achieve:

Task	Metric (lower is better unless otherwise noted)	SBDM Result	Baseline
CIFAR-10 Gen.	FID	2.83–3.13	2.90–2.95
ImageNet 32x32 Gen.	NLL (bits/dim)	3.76	3.77–3.86
MNIST SDF Gen.	FID (256x256)	21.9	23.9 (GANO)
MRI Recon. (fastMRI)	PSNR (dB)	2–10 dB > TV	U-Net, TV
Bayesian Inversion	RMSE / ES / FID/PSNR/SSIM	Best-in-class	TV, RealNVP

Significant speedups (3–10x) have been reported when using score-embedding and functional operator approaches, especially in high-resolution settings (Na et al., 2024, Hagemann et al., 2023, Lim et al., 2023).

6. Theoretical Insights and Limitations

Theoretical advances underpin SBDMs across function spaces, dimensions, and conditioning:

Operator-theoretic and Malliavin-calculus frameworks rigorously justify infinite-dimensional learning, conditional inference, and functional data regression (Mirafzali et al., 27 Aug 2025, Baker et al., 28 Jan 2026, Hagemann et al., 2023).
Dimension-free sample complexity provides a formal explanation for scalability in high dimensions (Kumar et al., 14 Feb 2025), while coefficient design enables adaptation to low-dimensional structure (Li et al., 2024).
Limitations persist: SBDMs can overfit when noise scales vanish, are sensitive to imperfect scores in IS, and remain costly to sample for high-resolution images or long time intervals. Efficient ODE samplers, low-variance estimators, and further architectural advances are active areas of research (Tang et al., 2024, Zhang et al., 27 May 2025, Na et al., 2024).
Practical implementation requires mesh-independence (function space), operator-network parameterization, and careful tuning of discretization/schedule (Lim et al., 2023, Hagemann et al., 2023).

7. Outlook and Future Directions

SBDM research is rapidly evolving:

Extensions to video, time series, and general dynamical systems using operator-valued or ensemble-based scores for high-dimensional/functional data (Huynh et al., 9 Aug 2025, Hagemann et al., 2023).
Efficient likelihood evaluation and ultrafast sampling via ODE-based and consistency model frameworks (Tang et al., 2024, Na et al., 2024).
Integration of explicit Bayesian methodology for principled posterior inference in inverse problems and uncertainty quantification (Feng et al., 2023, McCann et al., 2023, Chung et al., 2021, Baker et al., 28 Jan 2026).
Reinforcement learning-based fine-tuning for task-specific sampling, reward-directed generation, and human-aligned model outputs (Gao et al., 2024).
Theoretical developments in maximizing the efficiency of learning in high/low-dimensional regimes, and developing shape-adaptive samplers for structured data support (Kumar et al., 14 Feb 2025, Li et al., 2024).
Adaptive, mesh-agnostic, and physics-aware generative samplers for scientific machine learning and physical modeling (Mirafzali et al., 27 Aug 2025, Lim et al., 2023, Huynh et al., 9 Aug 2025).

Score-Based Diffusion Models therefore constitute a mathematically rigorous, highly-flexible, and empirically robust class of generative models, subsuming and advancing traditional score matching, normalizing flows, and denoising diffusions, with ongoing advances in theory, algorithmics, and applications.