Score-Based Generative Modeling

Updated 3 October 2025

Score-based generative modeling is a framework that transforms noise into data through stochastic differential equations guided by a time-dependent score function.
It leverages neural networks trained via denoising score matching to approximate the score function, enabling effective reverse SDE sampling with a predictor-corrector scheme.
The approach also allows deterministic sample generation and exact likelihood evaluation via a probability flow ODE, achieving state-of-the-art results in image synthesis and inverse problems.

Score-based generative modeling defines a unifying framework for constructing deep generative models by formulating sample synthesis as the numerical solution of a stochastic differential equation (SDE) whose drift depends on a time-dependent score function (the gradient of the log-density of the evolving data distribution). The central mechanism connects ideas from diffusion processes, Markov processes, and energy-based models, and provides a mathematically transparent route for transforming noise into data. This approach accommodates a variety of architectures, SDE designs, and sampling routines, and is distinguished by its flexibility, likelihood evaluation capabilities, and state-of-the-art empirical results in image synthesis and other domains.

1. Stochastic Differential Equation Framework

The core generative modeling process begins by defining a forward SDE that progressively injects noise into data samples, transforming a complex data distribution $p_0(x)$ into a tractable prior $p_T(x)$ (frequently a standard multivariate Gaussian). The forward trajectory is given by the Itô SDE: $dx = f(x, t)dt + g(t)dw,$ where $f(x, t)$ is a drift function, $g(t)$ a time-dependent diffusion coefficient, and $w$ denotes Brownian motion. For large $t$ , $p_t(x)$ approaches $p_T(x) \approx \mathcal{N}(0, I)$ .

The reverse generation process follows a reverse-time SDE, which, given knowledge of the time-dependent score $\nabla_x \log p_t(x)$ , takes the form: $dx = [f(x, t) - g(t)^2 \nabla_x \log p_t(x)]dt + g(t)d\bar{w},$ where $d\bar{w}$ is the time-reversed Wiener process. Hence, sample generation entails simulating this SDE from the simple prior $p_T$ back to $p_0$ , removing noise while being guided by the score (“gradient flow”) of the evolving distribution.

2. Score Function Estimation via Neural Networks

The intractability of the true score function for complex $p_t(x)$ necessitates learning an approximation $s_\theta(x, t)$ using a neural network. Training is performed using denoising score matching—an instance of the generalized score-matching paradigm—by minimizing the objective: $\min_\theta \mathbb{E}_t \left\{ \lambda(t) \mathbb{E}_{x_0 \sim p_0,\, x(t) \sim p_{0t}} \big[ \|s_\theta(x(t), t) - \nabla_x \log p_{0t}(x(t) \mid x_0)\|^2 \big] \right\}.$ Here, $p_{0t}(x(t)|x_0)$ is the analytic perturbation kernel via the diffusion SDE, and $\lambda(t)$ is a weighting function for time-resampling.

Once $s_\theta(x, t)$ is trained, it replaces the unknown $\nabla_x \log p_t(x)$ in the reverse SDE, and generating data reduces to numerically integrating this SDE. Standard solvers such as Euler–Maruyama or higher-order methods (e.g., Runge–Kutta) may be adopted, subject to stability and efficiency constraints.

3. Predictor–Corrector Sampling Paradigm

To mitigate discretization artifacts and improve sample quality, the predictor–corrector (PC) framework alternates between two procedures at each timestep:

Predictor step: Advances a sample using a deterministic discretization of the reverse SDE.
Corrector step: Executes score-based MCMC (notably annealed Langevin dynamics) to move the sample closer to high-density regions, using

$x \leftarrow x + \epsilon\, s_\theta(x, t) + \sqrt{2\epsilon}\,z,$

with $z \sim \mathcal{N}(0, I)$ .

This combination effectively fuses deterministic approximation with stochastic exploration, leading to measurable improvements in sample fidelity (e.g., lower FID) at fixed computational cost.

4. Probability Flow ODE and Likelihood Computation

The reverse SDE admits a deterministic counterpart via the probability flow ODE: $dx = \left[f(x, t) - \frac{1}{2}g(t)^2 \nabla_x \log p_t(x)\right] dt.$ This ODE, when integrating $s_\theta(x, t)$ for the score, allows deterministic generation of samples and, critically, enables exact likelihood evaluation using the change-of-variables formula: $\log p_0(x_0) = \log p_T(x_T) + \int_0^T \nabla \cdot \left[ f(x(t), t) - \frac{1}{2}g(t)^2 s_\theta(x(t), t) \right] dt.$ Likelihood computation is efficient via trace estimators (such as the Skilling–Hutchinson estimator) and distinguishes the framework from standard diffusion or GAN-based models, which either lack tractable likelihoods or require restrictive invertibility constraints.

5. Applications and Empirical Performance

Score-based generative modeling has been adapted to a variety of practical settings:

Class-conditional image generation: Conditioning the diffusion process on labels (by integrating a time-dependent classifier on noisy data) enables precise targeted synthesis.
Inverse problems: Extensions enable recovery in image inpainting, colorization, and other ill-posed reconstructions, by modifying the reverse SDE or ODE to respect observed data constraints.
Architectural advances: Employing modern network designs—residual blocks, progressive growing, skip connections—further empirically improves performance.
Metric results: On CIFAR-10, Inception score (IS) of 9.89 and FID of 2.20 are reported, with likelihoods of 2.99 bits/dim on dequantized data, establishing state-of-the-art sample quality and log-likelihood.

6. Mathematical Structure and Generalization

The framework can be summarized by key mathematical components:

Component	Formulation	Purpose
Forward SDE	$dx = f(x, t)dt + g(t)dw$	Diffuse data to tractable prior
Reverse SDE	$dx = [f(x, t) - g(t)^2 \nabla_x \log p_t(x)]dt + g(t)d\bar{w}$	Denoise/reconstruct data from noise
Training Objective (Score Net)	$\min_\theta \mathbb{E}_t \{\dots\}$ (see above)	Learn time-dependent score
Probability Flow ODE	$dx = [f(x, t) - (1/2)g(t)^2\nabla_x \log p_t(x)]dt$	Deterministic mapping, enables exact likelihood
Likelihood Formula	see above $^{\ast}$	Evaluate $\log p_0(x_0)$ for samples

$^{\ast}$ $\log p_0(x_0) = \log p_T(x_T) + \int_0^T \nabla \cdot [\,\cdot\,] dt$

7. Unification, Extensions, and Implications

This framework generalizes and subsumes prior approaches, including denoising score matching (Vincent, 2011), noise conditional score networks, and diffusion probabilistic models (DDPM, SMLD). The central insight is the continuous transformation of densities in probability space via score-driven flows; the time-dependent neural network instantiation for score estimation enables learning flexible, high-dimensional data distributions.

The predictor-corrector structure and probability flow ODE can be extended to alternative SDEs/ODEs and hybrid Sampler-ODE routines. The approach naturally handles high-resolution and high-dimensional data, supports flexible conditioning, and is applicable to domains such as image, audio synthesis, and inverse problems.

References

"Score-Based Generative Modeling through Stochastic Differential Equations" (Song et al., 2020)
For specific performance metrics and methods details, see (Song et al., 2020).

This amalgamation of diffusion processes, score learning, and SDE/ODE-based sampling sets a foundation for current and emerging lines of research in generative modeling and its theoretical guarantees.

PDF Markdown Chat (Pro)

References (1)

Score-Based Generative Modeling through Stochastic Differential Equations (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Score-based Generative Modeling.