Diffusion & Score-Based Models

Updated 13 May 2026

Diffusion models and score-based approaches are generative frameworks that corrupt data via stochastic differential equations and reverse the process with learned score functions.
They leverage denoising score matching, PDE regularization, and tools like Malliavin calculus to enhance sample quality, stability, and likelihood estimation.
These methods extend to infinite dimensions and discrete domains, finding applications in inverse problems, physics-informed modeling, and efficient data augmentation.

Diffusion models and score-based generative methods constitute one of the principal classes of modern deep generative modeling. These frameworks are founded on stochastic differential equations (SDEs) that systematically corrupt data to a noise prior—typically (but not always) Gaussian—then generate samples by reversing this process using learned estimators of the "score," i.e., the gradient of the log-density of noisy data. The mathematical scope of these approaches spans finite- and infinite-dimensional vector spaces, supports both continuous and discrete states, and leverages a diverse spectrum of estimation and regularization tools. Recent advances have solidified both the theoretical and practical underpinnings of these models, connecting their stochastic and PDE structure, extending their reach to function spaces, and improving sample quality, efficiency, and robustness.

1. Mathematical Foundations of Diffusion and Score-Based Models

At the heart of continuous-state diffusion models is the construction of a forward noising SDE,

$dX_t = f(X_t, t)\,dt + g(X_t, t)\,dW_t, \qquad X_0 \sim p_\text{data}$

with a corresponding reverse-time SDE,

$dY_t = \left[f(Y_t, t) - g(Y_t, t)^2 \nabla \log p_t(Y_t)\right]dt + g(Y_t, t)\,d\overleftarrow W_t,$

where $p_t$ is the marginal law of the forward process at time $t$ and $\nabla\log p_t$ is the score function (Song et al., 2021, Du et al., 2022, Lai et al., 2022). Prominent parameterizations include the Variance-Preserving (VP), Variance-Exploding (VE), and sub-VP SDEs, as well as generalizations such as FP-Diffusion, which introduces a position-dependent Riemannian metric and anti-symmetric drift (Du et al., 2022).

The theoretical machinery comprises connections to the Fokker–Planck equation, reversible and irreversible SDEs, and normalizing flows. In the finite-dimensional linear case with additive noise, the score has analytic form; in nonlinear or infinite-dimensional settings, tools from Malliavin calculus become central (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 8 Jul 2025).

Moreover, recent work embeds these models in a variational framework, connecting score matching objectives to lower bounds on the log-likelihood and providing decomposition of error into terms reflecting score estimation, path variability, and proximity to noise priors (Franzese et al., 2022, Song et al., 2021, Liu et al., 8 Nov 2025).

2. Score Estimation and Denoising Objectives

Central to generative sampling is the approximation of the score function $s_\theta(x,t) \approx \nabla_x \log p_t(x)$ . For linear SDEs with Gaussian transitions, the denoising score matching (DSM) loss is tractable:

$\mathcal{L}(\theta) = \mathbb{E}_{x_0\sim p_\text{data},\, x_t\sim p_{0t}(\cdot|x_0)} \left[\lambda(t) \|s_\theta(x_t, t) - \nabla_{x_t} \log p_{0t}(x_t | x_0)\|^2\right]$

with optimal weighting $\lambda^*(t) = g^2(t)$ ensuring (approximate) maximum likelihood (Song et al., 2021). In the general nonlinear setting (or for state-dependent coefficients), recent advances leverage Malliavin calculus to derive exact, closed-form score expressions via conditional expectations of Skorokhod integrals (Bismut–Elworthy–Li formula) or by expanding through first and second variation processes, thus bypassing the need for direct density differentiation (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 8 Jul 2025, Mirafzali et al., 27 Aug 2025).

Regularization is now increasingly employed to impose global self-consistency properties derived from the underlying Fokker–Planck PDE—so-called "score Fokker–Planck equation" regularization—that improves both likelihood and vector field conservativity (Lai et al., 2022).

Special considerations arise in discrete domains, where the score is interpreted as ratios of conditional probabilities; here, ratio-matching and cross-entropy objectives underpin estimator learning (Sun et al., 2022, Zhang et al., 2024).

3. Extensions: Infinite Dimensions, Discrete Spaces, and Physics-Informed Variants

Score-based frameworks have been rigorously generalized to infinite-dimensional Hilbert spaces, enabling modeling of data as functions (e.g., images, solutions to PDEs). Forward diffusions become stochastic partial differential equations (SPDEs) with trace-class noise, and the score is defined as the Fréchet derivative (gradient) in function space (Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023, Lim et al., 2023). Approximations utilize neural operator architectures and multilevel training ensures resolution-invariant estimation and sampling cost.

In discrete and categorical domains, the forward process is formulated as a continuous-time Markov chain (CTMC), and sampling employs reverse-time CTMCs parameterized by singleton-conditional ratio scores (Sun et al., 2022). Convergence rates and theoretical guarantees align with those in continuous settings (Zhang et al., 2024).

Physics-informed score-based diffusion models incorporate physical constraints directly into the score, either via simulation-trained networks or ensemble-based score filters, enabling real-time, recursive Bayesian inference for SPDEs and robust assimilation under high-dimensional, nonlinear, and data-sparse regimes (Huynh et al., 9 Aug 2025).

4. Methodological Innovations and Theoretical Advances

Forward Process Flexibility and Manifold Adaptation

FP-Diffusion generalizes the forward SDE to incorporate anisotropic (state-dependent) noise, Riemannian metrics, and symplectic drifts, ensuring ergodicity, Gaussian invariance, and compatibility with manifold-structured data. This allows the forward process to be jointly learned, matching the SDE to complex data geometries, thereby enlarging the variational family and tightening likelihood bounds (Du et al., 2022).

PDE and Stability Theory

Recent rigorous PDE analyses establish forward and reverse well-posedness via sharp $L^p$ and entropy-based stability Bounds. The Li–Yau differential inequality constrains the divergence of the score, preventing instability and ensuring concentration of reverse dynamics onto the data manifold with rate $O(\sqrt{t})$ as $dY_t = \left[f(Y_t, t) - g(Y_t, t)^2 \nabla \log p_t(Y_t)\right]dt + g(Y_t, t)\,d\overleftarrow W_t,$ 0. These results prescribe regularity and stopping-time selection, guiding model and loss design (Liu et al., 8 Nov 2025).

KL Barycenter Fusion

ScoreFusion exploits the linearity of log-densities under the Kullback–Leibler barycenter, enabling fusion of multiple pretrained diffusion models. The resulting fused score is simply a convex combination of auxiliary scores. Learning the simplex weights reduces to linear regression in diffusion time, yielding dimension-free sample complexity and improved adaptation to low-data targets (Liu et al., 2024).

Efficient Training, Sampling, and Data Augmentation

Optimal selection of diffusion time $dY_t = \left[f(Y_t, t) - g(Y_t, t)^2 \nabla \log p_t(Y_t)\right]dt + g(Y_t, t)\,d\overleftarrow W_t,$ 1 balances the objectives of base-proximity and score-matching accuracy, with auxiliary "bridging" distributions further improving efficiency for small $dY_t = \left[f(Y_t, t) - g(Y_t, t)^2 \nabla \log p_t(Y_t)\right]dt + g(Y_t, t)\,d\overleftarrow W_t,$ 2 (Franzese et al., 2022). Score Augmentation (ScoreAug) operates on noisy inputs, replacing classical clean-space augmentations, and enforces equivariance on the score network under general transformations, providing robust gains in limited data regimes (Hou et al., 11 Aug 2025). Analytical results demonstrate that, at high noise scales, well-trained diffusion models converge to a universal linear Gaussian score, enabling substantial acceleration by "teleporting" initial sampling steps analytically (Wang et al., 2023).

5. Applications and Algorithmic Implementations

Bayesian and Inverse Problems

Score-based models have been applied to imaging inverse problems (e.g., MRI, CT) by leveraging the learned score as a data prior combined with data-consistency projections, supporting arbitrary forward operators, uncertainty quantification, and principled Bayesian inference (MAP, MMSE, posterior sampling) (Chung et al., 2021, McCann et al., 2023).

Table: Summary of Major Model Extensions

Domain	Reference	Core Innovation
Infinite-dim	(Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023, Lim et al., 2023)	Malliavin calculus, operator networks, multilevel training
Discrete	(Sun et al., 2022, Zhang et al., 2024)	CTMC forward process, ratio-matching scores
KL fusion	(Liu et al., 2024)	KL barycenter, linear score fusion, low-data adaptation
Physics-informed	(Huynh et al., 9 Aug 2025)	Ensemble score filters, SPDE-adapted sampling
Score regularization	(Lai et al., 2022, Liu et al., 8 Nov 2025)	PDE consistency, Fokker–Planck score equation, Li–Yau inequality

6. Challenges, Open Directions, and Practical Considerations

Open problems include scalable parameterizations for data-adaptive Riemannian metrics, extending score-based approaches to hybrid continuous–discrete spaces, efficient empirical estimation of the score Fokker–Planck residual, analytical determination of optimal diffusion time, and dynamically tuning data augmentation schemes.

In high-dimensional and infinite-dimensional regimes, discretization error and approximate solvers remain practical bottlenecks. Combining data-driven neural estimators with analytic or physics-informed score targets (e.g., via Malliavin expansions or ensemble filters) is a promising avenue for both sample quality and robustness (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 8 Jul 2025, Huynh et al., 9 Aug 2025).

The field is also moving toward a principled synthesis of generative modeling, data-fusion, and domain-aware priors, as exemplified by recent theoretical and algorithmic advances in KL barycenter fusion, infinite-dimensional operator learning, and the integration of physical constraints.

In summary, diffusion models and score-based approaches represent a highly expressive, flexible, and theoretically grounded paradigm for generative modeling. Current research delivers both rigorous mathematical analysis and continuous methodological innovation, with an increasing focus on sample efficiency, robustness across data modalities (discrete, continuous, functional), learned geometric adaptation, and deep links to variational and PDE perspectives (Song et al., 2021, Lai et al., 2022, Du et al., 2022, Zhang et al., 2024, Hagemann et al., 2023, Wang et al., 2023, Liu et al., 2024, Mirafzali et al., 21 Mar 2025, Mirafzali et al., 8 Jul 2025, Mirafzali et al., 27 Aug 2025, Huynh et al., 9 Aug 2025, Hou et al., 11 Aug 2025, Liu et al., 8 Nov 2025, Franzese et al., 2022, Sun et al., 2022).