Score-Based Modeling with SDEs

Updated 22 September 2025

Score-based modeling with SDEs is a framework that reverses a diffusion process, transforming complex data distributions into tractable noise via time-dependent gradient estimation.
It integrates forward and reverse SDE formulations with neural networks to accurately estimate score functions, driving state-of-the-art generative modeling and inverse problem solutions.
Applications include image synthesis, MRI reconstruction, and high-dimensional simulations, with recent computational advances enhancing efficiency and sample quality.

Score-based modeling with stochastic differential equations (SDEs) is a framework in generative modeling that formulates data generation as the reversal of a diffusion process which gradually transforms complex data distributions into simple, tractable priors such as Gaussian noise. The reverse evolution is governed by SDEs whose drift term is determined by the time-dependent gradient of the log-density (the score) of the evolving data distribution. This paradigm unifies and extends previous approaches—such as diffusion probabilistic models, continuous normalizing flows, and score-matching Langevin dynamics—and has led to state-of-the-art results in images, scientific simulation, and inverse problems. Score-based SDE modeling encompasses SDE definition and reversal, score function estimation, advanced sampling strategies, and a wide array of applications across domains.

1. Mathematical Foundations: Forward and Reverse SDEs

The basis of score-based generative modeling is the definition of a forward SDE, typically of the Itô form: $dx = f(x, t)dt + g(t)dw$ where $f(x, t)$ is a drift vector field, $g(t)$ is a (possibly scalar) diffusion coefficient, and $w$ is a standard Wiener process (Song et al., 2020). Starting with data $x(0) \sim p_0$ , this process perturbs the data with increasingly strong noise, producing a family of distributions $\{p_t\}_{t \in [0, T]}$ that converge to a tractable prior $p_T$ (e.g. Gaussian).

The reversal of this process—constructing samples from $p_0$ by starting from noise—relies on Anderson’s time-reversal of SDEs: $dx = [f(x, t) - g(t)^2 \nabla_x \log p_t(x) ]dt + g(t) d\bar{w}$ where $d\bar{w}$ is a backward-time Wiener process. The correction term involving the score $\nabla_x \log p_t(x)$ enables the controlled removal of noise, transforming simple noise into structured data.

Variants of SDEs, such as the Variance Exploding (VE) SDE (Song et al., 2020) and the Variance Preserving (VP) SDE, instantiate different noise schedules and forward processes corresponding to continuous or discrete-time procedures.

2. Score Function Estimation

The key challenge is evaluation of the score function $\nabla_x \log p_t(x)$ for all $t$ . This is addressed by parameterizing a time-dependent neural network $s_\theta(x, t)$ and training it via denoising score matching: $\min_\theta \mathbb{E}_{t, x(0), x(t)} \left\{ \lambda(t)\left\|s_\theta(x(t), t) - \nabla_x \log p_{0t}(x(t) | x(0)) \right\|^2 \right\}$ where $\lambda(t)$ is a weighting function and $p_{0t}$ is the known transition kernel of the forward SDE (typically Gaussian if $f(x, t)$ and $g(t)$ are appropriately chosen) (Song et al., 2020, Song et al., 2021).

Variants such as sliced score matching (Tang et al., 12 Feb 2024), score-PINN, and categorical ratio matching (for discrete data (Sun et al., 2022)) alter the loss to suit high-dimensional, discrete, or physics-constrained cases.

3. Sampling Algorithms and Probability Flow ODEs

Sample generation in score-based SDE models is implemented by numerically integrating the reverse-time SDE using general-purpose solvers such as Euler–Maruyama, adaptive improved Euler (Jolicoeur-Martineau et al., 2021), and more advanced predictor-corrector methods (Song et al., 2020). A key innovation is the Predictor–Corrector (PC) sampler, which alternates between a numerical reverse SDE step ("predictor") and a Markov chain Monte Carlo correction ("corrector"), often implemented via (annealed) Langevin dynamics.

In addition, the probability flow ODE, a deterministic analogue of the SDE, is defined as: $dx = f(x, t) - \frac{1}{2} g(t)^2 \nabla_x \log p_t(x) dt$ It shares the same marginals as the SDE, enables exact likelihood computation, and allows for ODE-based adaptive sampling (Song et al., 2020, Song et al., 2021). Analyses have demonstrated that the distributions sampled via the ODE and SDE can differ depending on score approximation and that regularization to enforce Fokker–Planck consistency can control this gap (Deveney et al., 2023).

4. Applications: Generative Modeling, Inverse Problems, and Beyond

Score-based SDE models are effective for unconditional generation, inpainting, conditional generation, and more general inverse problems:

Inverse problems, such as MRI or CT reconstruction, are handled by augmenting the drift of the reverse SDE with the gradient of the log-likelihood of observations $\nabla_x \log p_t(y|x)$ (Chung et al., 2021). The inference alternates between reverse SDE steps and measurement consistency projections.
Data assimilation and trajectory inference are addressed by decomposing long trajectory score functions into local score models, applying non-autoregressive inference for stochastic dynamical systems (Rozet et al., 2023, Huynh et al., 9 Aug 2025).
Dirichlet and categorical discrete processes extend SDE-based score methods to biological sequence generation and combinatorial constraints using continuous-time Markov jumps or processes on the probability simplex (Avdeyev et al., 2023, Sun et al., 2022).
Applications to high-dimensional and infinite-dimensional settings, including SPDEs and Hilbert spaces, have been rigorously formulated using Malliavin calculus and operator-theoretic tools (Mirafzali et al., 27 Aug 2025, Mirafzali et al., 8 Jul 2025), supporting infinite-dimensional regression and kernel methods.

5. Theoretical Guarantees and Convergence

Recent work addresses the convergence and correctness of score-based SDE models:

Under mild assumptions (e.g., only $L^2$ -accurate score estimation), polynomial convergence in total variation and Wasserstein distances can be guaranteed for general multi-modal, non-smooth distributions (Lee et al., 2022).
Maximum likelihood formulations for score-based SDEs have been developed, leveraging weighted score matching losses to upper bound negative log-likelihood and relate the framework to continuous normalizing flows (CNFs) (Song et al., 2021).
Malliavin calculus, including the Bismut–Elworthy–Li formula in both finite and infinite dimensions, provides closed-form expressions for the score function in terms of stochastic variational processes and allows operator-theoretic, non-discretization-based derivations (Mirafzali et al., 8 Jul 2025, Mirafzali et al., 21 Mar 2025, Mirafzali et al., 27 Aug 2025).

6. Computational Advancements and Extensions

Recent innovations have targeted the computational bottlenecks in score-based modeling:

Adaptive step size SDE solvers significantly reduce computational cost while preserving or improving sample quality in high resolution settings by using local error estimates in $l_2$ norm (Jolicoeur-Martineau et al., 2021).
Wavelet factorization across scales (WSGM) (Guth et al., 2022) achieves linear scaling with image size in sampling, addressing the ill-conditioning that historically plagued high-resolution diffusion sampling.
Simulation-free and adjoint-free training methods (e.g., SDE Matching (Bartosh et al., 4 Feb 2025)) eliminate the need for numerical forward SDE trajectories in training latent SDEs.

7. Future Directions and Open Challenges

Score-based SDE modeling continues to be a dynamic area:

The tuning of noise schedules, SDE types, and step sizes remains heuristic; more principled, theoretically informed selection is an open direction (Song et al., 2020).
Extending the framework to nonlinear SPDEs with state-dependent (multiplicative) noise, non-Gaussian noise settings, and further hybridization with consistency models and reinforcement learning is under active investigation (Tang et al., 12 Feb 2024).
Infinite-dimensional extensions and connections to functional regression and neural operator architectures suggest significant future applications in scientific computing, imaging, and functional data analysis (Mirafzali et al., 27 Aug 2025).

Score-based modeling with SDEs provides a mathematically principled, highly general framework that connects probabilistic diffusion, neural estimation, and modern sampling strategies, enabling generative modeling and inference in complex, high-dimensional data domains.