Score-Based Diffusion Models

Updated 19 March 2026

Score-based diffusion models are deep generative frameworks that reverse stochastic diffusion processes using score matching, enabling precise data synthesis and uncertainty-aware Bayesian inference.
They leverage stochastic differential equations and denoising score matching to train neural estimators for high-dimensional, continuous, and categorical data applications.
Recent advances extend these models to infinite-dimensional spaces and manifold adaptations, improving efficiency, sample quality, and applicability in scientific inverse problems.

Score-based diffusion models are a leading class of deep generative models that define data generation as the time reversal of a stochastic diffusion process, parameterized through the score function—the gradient of the log-density of the evolving distribution. They offer a mathematically rigorous probabilistic formulation for data synthesis, density estimation, and inverse problems, relying on stochastic differential equations (SDEs), score matching, and in practice, neural estimators for high-dimensional data. Originally developed for continuous vector spaces (images), score-based diffusion models have been generalized to accommodate categorical data, function spaces, and infinite dimensions, and have been deployed for uncertainty-aware Bayesian inference in scientific applications.

1. Mathematical Foundations: SDEs, Scores, and Model Construction

A score-based diffusion model begins by specifying a forward diffusion process that gradually corrupts a data sample $x_0\sim p_{\rm data}$ , typically as an Itô SDE: $dx_t = f(x_t, t) dt + g(t) dW_t,$ where $f(\cdot,t)$ is the drift term, $g(t)$ the noise schedule, and $W_t$ standard Brownian motion. Common instantiations include the variance-preserving (VP) and variance-exploding (VE) SDEs; for example, in VE, $f(x,t)=0$ , $g(t)=\sqrt{d\sigma^2(t)/dt}$ with a monotonic noise variance schedule (e.g., geometric, exponential, or data-dependent) (Tang et al., 2024, Lai et al., 2022).

The time-reversal of this dynamic, as established by Anderson (1982), reveals that the generative process is again an SDE, but with an additional drift driven by the score function $\nabla_x\log p_t(x)$ : $dx_t = [f(x_t, t) - g^2(t)\nabla_x\log p_t(x_t)] dt + g(t) d\overline{W}_t,$ where $\overline{W}_t$ is backward Brownian motion. Proper generative sampling thus requires accurate estimation of these scores at all noise levels.

2. Score Estimation and Denoising Score Matching

Since the true score $dx_t = f(x_t, t) dt + g(t) dW_t,$ 0 is generally intractable, a neural function $dx_t = f(x_t, t) dt + g(t) dW_t,$ 1 is learned via denoising score matching (DSM). DSM leverages the closed-form conditional density $dx_t = f(x_t, t) dt + g(t) dW_t,$ 2, which is Gaussian for many SDEs. The loss is: $dx_t = f(x_t, t) dt + g(t) dW_t,$ 3 where typically $dx_t = f(x_t, t) dt + g(t) dW_t,$ 4 to balance the signal-to-noise ratio across $dx_t = f(x_t, t) dt + g(t) dW_t,$ 5 (Tang et al., 2024, Lai et al., 2022). Training is performed by sampling $dx_t = f(x_t, t) dt + g(t) dW_t,$ 6 uniformly, generating forward noisy samples, and regressing $dx_t = f(x_t, t) dt + g(t) dW_t,$ 7 to match the analytic denoising score. This procedure can be augmented for categorical data by matching singleton conditionals ("categorical ratio matching") (Sun et al., 2022).

Alternative approaches include sliced score matching (using projections and Monte Carlo) (Na et al., 2024), and regularization enforcing the score Fokker–Planck equation to encourage global score consistency (Lai et al., 2022).

3. Reverse Processes, Probability-Flow ODEs, and Sampling Algorithms

The generative process can be instantiated as either a reverse SDE or a deterministic probability-flow ODE: $dx_t = f(x_t, t) dt + g(t) dW_t,$ 8 The ODE and SDE share the same marginals, but the ODE enables deterministic sampling (e.g., DDIM) and tractable likelihood evaluation via the instantaneous change of variables (Tang et al., 2024, Song et al., 2021). Sampling typically employs discretized Euler–Maruyama or higher-order schemes, and can be further improved with predictor–corrector methods that combine deterministic steps (predictor) with Langevin MCMC moves (corrector) (Tang et al., 2024). For function spaces, Langevin samplers can be made resolution-invariant via neural operator architectures (Lim et al., 2023).

In the discrete (categorical) domain, sampling is achieved via a reverse continuous-time Markov chain where "scores" are given by probability ratios of singleton conditionals rather than gradients (Sun et al., 2022).

4. Theoretical Guarantees, Likelihood Evaluation, and Model Extensions

Score-based diffusion models are theoretically linked to maximum likelihood estimation (MLE). With appropriate weighting $dx_t = f(x_t, t) dt + g(t) dW_t,$ 9, the DSM objective upper bounds the negative log-likelihood of data under the generative model, and the probability-flow ODE enables direct log-likelihood computation (Song et al., 2021, Feng et al., 2023). The ODE’s divergence term can be efficiently estimated with Hutchinson’s trace estimator.

Recent advances demonstrate adaptation to low-dimensional data manifolds: for DDPM samplers, careful coefficient schedules yield convergence rates depending only on the intrinsic data dimension $f(\cdot,t)$ 0 (e.g., $f(\cdot,t)$ 1), breaking the previously ambient-dimension-limited scaling (Li et al., 2024). Infinite-dimensional extensions are rigorously justified through measure-theoretic constructions and operator-based neural networks (Lim et al., 2023, Baldassari et al., 2023).

Flexible SDE parameterizations, e.g., learned spatial/noise geometries or symplectic structures, have been introduced to align the diffusion more closely to data geometry and to broaden the class of generative paths (Du et al., 2022).

5. Applications: Image and Scientific Inverse Problems

Score-based diffusion models are now widely adopted for image generation, uncertainty-aware Bayesian inference, and scientific applications.

Bayesian inverse problems: The learned score prior enables principled maximum a posteriori (MAP), minimum mean squared error (MMSE), and full Bayesian posterior sampling for problems such as image denoising, deblurring, phase retrieval, and undersampled MRI/CT reconstruction (McCann et al., 2023, Feng et al., 2023, Chung et al., 2021, Han et al., 2024).
Uncertainty quantification: The generative formulation naturally produces calibrated uncertainty maps by running multiple posterior draws (Chung et al., 2021, Baldassari et al., 2023).
Nowcasting and spatiotemporal prediction: Score-based diffusion models have been employed for high-fidelity, ensemble-aware short-term weather forecasting (nowcasting) from satellite imagery, preserving sharpness beyond conventional deep learning baselines (Chase et al., 15 May 2025).
Physics-based and function-space problems: Score-based frameworks have been extended for SPDE filtering (Huynh et al., 9 Aug 2025), infinite-dimensional Bayesian inference (Baldassari et al., 2023), and operator-based generation (Lim et al., 2023).
Discrete and hybrid data: SDDMs generalize the score principle to categorical data via continuous-time Markov jump processes, with unbiased score matching based on singleton conditionals (Sun et al., 2022).

6. Algorithmic Innovations and Computational Considerations

Major computational advances involve acceleration, sample quality, and generality:

Score embedding: Embedding numerically solved Fokker–Planck scores reduces training epochs, with up to 5–10× training-time reduction for similar denoising performance (Na et al., 2024).
Adaptive diffusion time and auxiliary bridging: Shorter diffusion durations, bridged by auxiliary models, reduce training and sampling cost without loss of log-likelihood or sample quality (Franzese et al., 2022).
Ensemble and uncertainty: The stochasticity of diffusion model sampling is leveraged for ensemble generation and spread–skill calibration, crucial for probabilistic forecasting applications (Chase et al., 15 May 2025).
Function space and discretization-invariance: Recent architectures use neural operators or Fourier neural networks to ensure inference and sampling are independent of discretization, extending score-based inference to arbitrary mesh resolutions (Baldassari et al., 2023, Lim et al., 2023).

7. Frontiers: Manifold Adaptation, Self-Consistency, and Open Problems

Score-based diffusion is at the forefront of manifold-aware generative modeling. Approaches such as Manifold Attracted Diffusion (MAD) propose modified inference schemes to collapse small off-manifold noise while preserving on-manifold variability, using extended score operators efficiently computed from pre-trained networks (Elbrächter et al., 29 Sep 2025). Theoretical work identifies essential schedule designs that enable exact adaptation to unknown data manifolds (Li et al., 2024).

A persistent challenge is ensuring the learned scores satisfy the global self-consistency imposed by the score Fokker–Planck equation; regularization methods such as FP-Diffusion have been shown to improve likelihood, conservativity, and sample diversity (Lai et al., 2022).

Open problems include fully automatic adaptation of schedule hyperparameters, higher-order sampling algorithms, bridging to manifold learning theory, and rigorous analysis of high-dimensional/global error behavior (Elbrächter et al., 29 Sep 2025, Li et al., 2024). These directions motivate ongoing advances bridging rigorous statistical physics, SDE theory, and practical deep generative modeling.