Score-Based Diffusion Models

Updated 26 December 2025

Score-based diffusion models are generative frameworks that invert noise-adding stochastic processes using the gradient of log-density (score) to recover data samples.
They employ methods like denoising score matching with neural networks and SDE/CTMC formulations to ensure accurate high-dimensional sampling and robust uncertainty quantification.
The approach extends to infinite-dimensional and discrete settings, integrating Bayesian inference and adaptive filtering while providing rigorous error bounds and convergence guarantees.

Score-based diffusion models define a flexible class of generative modeling techniques for high-dimensional data, where sample generation is accomplished by simulating the time-reversal of a continuous or discrete stochastic process that adds noise to data. The central quantity is the score function—i.e., the gradient of the log density of the data distribution after evolving under the forward process. State-of-the-art frameworks rely on stochastic differential equations (SDEs) or discrete Markov chains, and neural score estimators trained via variants of score matching. Modern advances extend these models to infinite-dimensional settings, adapt them to structurally complex or data-scarce problems, rigorously integrate them into probabilistic inference, and develop efficient computational representations and theoretical underpinnings.

1. Mathematical Formulation and Core Principles

Let $x_0$ denote a sample from the data distribution on a finite- or infinite-dimensional space (typically $\mathbb{R}^d$ or a separable Hilbert space $H$ ). The forward (noising) process is specified as either an Itô SDE: $dX_t = f(X_t, t)\,dt + g(t)\,dW_t,\qquad X_0 \sim p_0,$ or, in the infinite-dimensional case, as a linear SPDE: $du(t) = A\,u(t)\,dt + Q^{1/2}\,dW_t, \quad u(0) = u_0 \in H,$ where $A$ generates a strongly continuous semigroup, $Q$ is a positive, self-adjoint, trace-class covariance operator, and $W_t$ is a cylindrical Wiener process.

The time-marginal density $p_t$ (or the law $Q_t$ for infinite dimensions) evolves under a Fokker–Planck PDE or its functional generalization. Recovering data samples requires inverting this process: the time-reversal yields

$dX_t = [f(X_t, t) - g(t)^2 \nabla_x \log p_t(X_t)]\,dt + g(t)\,d\overline{W}_t,$

where the score $\nabla_x \log p_t$ governs the drift of the reverse process. For infinite-dimensional $H$ , the score is formally the Fréchet derivative of the log-density along "accessible" Cameron–Martin directions.

In discrete domains, the forward process is a continuous-time Markov chain (CTMC) with time-dependent generator $Q_t$ on finite product spaces, and the "score" is encoded as log-ratio functions between configurations (Sun et al., 2022).

Alternative perspectives, such as random-walk-based interpretations, recast sampling as sequences of short Langevin steps using denoising estimates and Tweedie's formula to define the score at finite noise levels (Park et al., 27 Nov 2024).

2. Score Function Estimation and Malliavin Calculus Characterizations

The theoretical and algorithmic core is estimation of the score field $\nabla \log p_t(\cdot)$ (or its infinite-dimensional analog). In finite dimensions, the denoising score matching (DSM) objective leverages the availability of the Gaussian forward kernel $p_{0t}(x_t \mid x_0)$ ; the optimal score estimator for $x_t \sim p_{0t}(\cdot \mid x_0)$ is $-(x_t - x_0)/\sigma^2(t)$ in the variance-exploding (VE) case (Chung et al., 2021, McCann et al., 2023, Tang et al., 12 Feb 2024).

In function space, DSM generalizes via the Feldman–Hajek theorem and Cameron–Martin formula, yielding well-posed losses in the dual Cameron–Martin space of the reference measure (Lim et al., 2023, Mirafzali et al., 27 Aug 2025). An explicit, infinite-dimensional Bismut–Elworthy–Li formula gives the closed-form score for the law of a linear SPDE: $\nabla \log p_{u(T)}(u) = -\gamma_{u(T)}^\dagger (u - S(T) u_0) \in H,$ where $\gamma_{u(T)}$ is the Malliavin covariance operator. Malliavin derivatives and Skorokhod (divergence) integrals allow similar representations for more general SDEs and SPDEs, including explicit formulas for nonlinear (state-independent diffusion) cases (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 27 Aug 2025). The covering vector field construction and integration-by-parts on Wiener space underpin this calculus and enable systematic score computation.

3. Training Objectives, Algorithmic Templates, and Discretization

Empirical score estimation is typically accomplished by minimizing the denoising score matching loss over a neural network parameterization $s_\theta(x, t)$ , with weighting $\lambda(t)$ reflecting the variance of the noising schedule. For maximum likelihood training, the loss uses $\lambda(t) = g(t)^2$ , giving an upper bound on the negative log-likelihood and connecting to continuous normalizing flows via the probability-flow ODE (Song et al., 2021).

Modern frameworks explicitly decouple training and sampling schedules: the "random walks with Tweedie" framework posits a unified template where the score is derived from an MMSE denoiser, $r_\theta(x, \sigma) \approx \mathbb{E}[X_0 \mid x_\sigma = x]$ , with

$\nabla \log f_{X_\sigma}(x) \approx \frac{r_\theta(x, \sigma) - x}{\sigma^2}$

and the sampler updates as

$x_{k+1} = x_k + \tau_k\,\frac{r_\theta(x_k, \sigma_k) - x_k}{\sigma_k^2} + \sqrt{2 \tau_k \mathcal{T}_k}\,z_k$

for a flexible schedule of $(\sigma_k, \tau_k)$ (Park et al., 27 Nov 2024).

Infinite-dimensional and operator-theoretic settings exploit operator-valued neural networks or kernel methods acting on functional admissible domains. Discretization invariance is achieved by parameterizing in the basis of a trace-class covariance, enabling multilevel and cross-resolution training and sampling (Lim et al., 2023, Hagemann et al., 2023).

Auxiliary regularizations, such as enforcing the score Fokker–Planck equation, improve likelihoods and self-consistency by penalizing the FPE residual $\epsilon[s_\theta] := \partial_t s_\theta(x,t) - x\,\mathcal{L}[s_\theta](x,t)$ (Lai et al., 2022).

4. Extensions: Discrete Data, Bayesian Inference, and Infinite-Dimensional Conditional Models

Score-based diffusion methodologies extend to discrete domains by formulating a CTMC with noise-injecting jump rates. Learning is performed via ratio matching of singleton conditionals; the reverse process is an analytically computable CTMC with generator informed by the log-ratio function, enabling faithful modeling of categorical and text/image token data (Sun et al., 2022).

For Bayesian inverse problems, the approach is generalized to conditional denoising estimators, which provably learn the conditional score for posterior inference. Infinite-dimensional theory establishes the precise form of the conditional score in function space, with rigorous analysis of regularization and blow-up at small times (i.e., as observational noise vanishes) (Baldassari et al., 2023). Posterior sampling and uncertainty quantification are thus made discretization-invariant and scalable to high-resolution or PDE-governed inverse problems (Feng et al., 2023, McCann et al., 2023).

In high-dimensional physics-influenced stochastic systems (SPDEs), the score-based model is embedded in recursive Bayesian updating, with ensemble-based (training-free) approximations for the score enabling real-time adaptive filtering (Huynh et al., 9 Aug 2025). This facilitates assimilation of new data and accurate uncertainty quantification in infinite-dimensional dynamical settings.

5. Theoretical Properties: Convergence, Low-Dimensional Adaptivity, and Regularity

Rigorous non-asymptotic error guarantees relate score estimation error and time discretization to discrepancy in total variation and Wasserstein metrics between the generated and true data measures (Tang et al., 12 Feb 2024). Deterministic, covering net-based analyses demonstrate that adapted reverse-variance and drift coefficients can yield dimension-free convergence $O(k^2/\sqrt{T})$ , where $k$ is intrinsic (manifold) dimension, even when ambient dimension $d \gg k$ (Li et al., 23 May 2024).

Trace-class noise models, operator-adapted networks, and Malliavin/Bismut–Elworthy–Li arguments ensure that the infinite-dimensional framework preserves statistical and geometric consistency, accommodates general types of spatial correlation, and avoids discretization artifacts (Hagemann et al., 2023, Lim et al., 2023, Mirafzali et al., 27 Aug 2025).

Enforcing fundamental self-consistency (via score-FPE regularization or likelihood-weighted loss) improves both sample quality and explicit density estimation, and universality results guarantee that function-approximation architectures (e.g., kernel regression, neural operators) can represent the infinite-dimensional scores to arbitrary accuracy given suitable trace-class decay (Lim et al., 2023, Mirafzali et al., 27 Aug 2025).

6. Computational Methods and Practical Considerations

Score-based diffusion generative models are implemented with a variety of architectures, including U-Nets with adaptive time embedding, Fourier Neural Operators, and kernel-based estimators in functional data settings. Training and inference efficiency has been improved by score-embedding (precomputing the score via Fokker–Planck PDE inversion), slice Wasserstein/sliced score matching for scalable loss evaluation, and adaptive ensemble or hybrid prediction-correction schemes (Na et al., 10 Apr 2024, Hagemann et al., 2023, Huynh et al., 9 Aug 2025).

Posterior inference in high-dimensional or inverse settings is performed via conditional sampling that enforces measurement consistency either by explicit SDE/ODE augmentation or by integrating the data likelihood into the score field (Feng et al., 2023, Chung et al., 2021). In discrete domains, exact singleton-conditional reverse samplers are implementable due to the analytical structure of the CTMC.

Ensemble-based score filtering and recursive Bayesian frameworks sidestep neural training for high-dimensional states, enabling real-time filtering in SPDEs and geophysical models (Huynh et al., 9 Aug 2025). Main computational bottlenecks remain in high-resolution PDE solves and covariance construction, but parallelism and low-rank approximations alleviate cost.

7. Open Challenges and Future Directions

Key directions include (a) extension to state-dependent and hypoelliptic diffusions leveraging full Malliavin calculus; (b) improved discretization schemes and error bounds for complex or degenerate priors; (c) hybrid probabilistic inference frameworks combining score-based generative priors with classical variational or MCMC approaches; and (d) robust adaptation to non-Gaussian or heavy-tailed noise structures in scientific and inverse imaging applications (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 27 Aug 2025, Baldassari et al., 2023).

Research is also ongoing on variational design of SDE coefficients, RL-driven generative policies, and consistent objective weighting to trade off sample quality and likelihood (Du et al., 2022, Gao et al., 7 Sep 2024, Song et al., 2021). Algorithmic flexibility in scheduling, denoising parametrizations, and network design continues to significantly impact efficiency, generalization, and theoretical performance across application domains.