Score-Based Diffusion Approach

Updated 4 January 2026

Score-based diffusion approach is a generative modeling framework that reverses a noise-inducing process using score functions (gradients of log-density) to synthesize data.
It leverages rigorous mathematical foundations including Malliavin calculus, operator theory, and PDEs to accurately estimate scores in both finite and infinite dimensions.
The approach underpins state-of-the-art applications in image generation, Bayesian inverse problems, and scientific computing by combining neural network estimators with robust analytical guarantees.

Score-based diffusion approaches comprise a class of generative models where sample synthesis proceeds by reversing a data-destroying stochastic process using the score (gradient of the log-density) of intermediate noisy distributions. This concept generalizes to arbitrary data domains, forward processes—including stochastic differential equations (SDEs) and Markov jump processes—and underpins state-of-the-art performance in image, scientific data, discrete, and function-space generative modeling. The framework is mathematically grounded in statistical physics, stochastic analysis (notably Malliavin calculus), partial differential equations (PDEs), and optimal control theory.

1. Core Mathematical Principles

The foundation of a score-based diffusion model is the specification of a forward-time stochastic process that perturbs data into noise, and a reverse-time generative process parameterized by the score of the evolving distribution.

Forward Process (SDE/Markov): For continuous data, typically an SDE of the form

$dX_t = f(X_t, t)\,dt + g(t)\,dW_t$

where $f$ is a drift, $g$ is a (possibly time-dependent) noise amplitude, and $W_t$ is standard Brownian motion (Song et al., 2021, Liu et al., 8 Nov 2025, Du et al., 2022). In infinite dimensions (e.g. functional data), the forward process is a stochastic PDE driven by colored noise with trace-class covariance (Hagemann et al., 2023, Lim et al., 2023, Mirafzali et al., 27 Aug 2025).

For discrete data, the forward process is replaced by a continuous-time Markov chain with a generator $Q_t$ (Sun et al., 2022).

Reverse Process and the Score: The reverse-time process that reconstructs data from noise relies on the time-dependent score function $s_t(x) = \nabla_x \log p_t(x)$ , yielding the SDE

$dX_t = [f(X_t, t) - g^2(t)\,s_t(X_t)]\,dt + g(t)\,d\bar{W}_t$

where $\bar{W}_t$ is backward Brownian motion (Liu et al., 8 Nov 2025, Song et al., 2021, Mirafzali et al., 27 Aug 2025).

For discrete spaces, gradients are replaced by conditional probability ratios, and reversal is described by a time-inhomogeneous Markov chain whose rates depend on these ratios (Sun et al., 2022).

Score Estimation: In finite dimensions, the score is estimated by neural networks via denoising score matching or related objectives. In infinite-dimensional Hilbert spaces, the score admits explicit operator-theoretic representations involving Malliavin calculus, often leading to closed-form solutions for Gaussian processes (see Sections 2, 3 below) (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 8 Jul 2025, Mirafzali et al., 27 Aug 2025).

2. Analytical Foundations: Malliavin Calculus and Operator Theory

Malliavin calculus provides a pathwise stochastic analysis framework for differentiating measures over infinite-dimensional spaces, enabling rigorous score computation for function- or field-valued data.

Score via Malliavin Calculus: For a Gaussian process $(X_t)$ on a separable Hilbert space $H$ ,

$\nabla\log p_t(x) = -\gamma_{X_t}^\dagger(x - S(t)x_0)$

where $\gamma_{X_t}$ is the Malliavin covariance operator, $S(t)$ is the forward semigroup, and $\gamma_{X_t}^\dagger$ denotes the pseudoinverse (Mirafzali et al., 27 Aug 2025). This formula generalizes to the Fréchet gradient for measures in $H$ and holds for general trace-class covariance operators (possibly non-diagonalizable).

Bismut–Elworthy–Li Representation: The score in the direction $h \in H$ is expressible as

$\nabla_h\log p_t(x) = -\mathbb{E}\left[ \delta(v_h) \mid X_t = x \right]$

with a deterministic adapted process $v_h$ defined via semigroup and covariance operators, and $\delta$ the Skorokhod integral (Mirafzali et al., 27 Aug 2025, Mirafzali et al., 8 Jul 2025). In finite-dimensional SDEs, similar representations relate the gradient of $\log p_t$ to weighted Skorokhod or Itô integrals involving Jacobian (variation) processes and the Malliavin matrix (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 8 Jul 2025). Specialization to Gaussian or linear forward processes recovers classical score formulas.

Functional Data Analysis and Kernels: The analytic form for the score enables nonparametric estimation using kernel regression (in reproducing kernel Hilbert spaces) or neural operator learning (Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023).

3. Score-Based Diffusion in Infinite Dimensions

This approach extends score-based generative modeling to function spaces or random fields:

Infinite-dimensional Ornstein–Uhlenbeck (OU) Process: The forward process for function-valued data is typically the OU SPDE,

$dX_t = A X_t\,dt + B\,dW_t$

where $A$ is an unbounded (often elliptic) operator and $B$ encodes scale and smoothness via colored noise (Hagemann et al., 2023). Trace-class assumptions on the noise ensure the existence and absolute continuity of measures.

Closed-form Score and Discretization: The Malliavin covariance operator, its pseudoinverse, and the OU semigroup permit explicit evaluation of the Fréchet derivative of the log-density, respecting the geometry of the Hilbert space (Hagemann et al., 2023, Lim et al., 2023, Mirafzali et al., 27 Aug 2025). Multilevel strategies allow training score networks on coarse grids and prolongation to higher resolutions with provable convergence in Wasserstein distance (Hagemann et al., 2023).
Operator-valued Neural Networks: Score networks can be parameterized as operator-valued mappings (e.g., spectral-transform MLPs, kernel-integral layers) to guarantee appropriate input/output geometry and invariance properties (Hagemann et al., 2023, Lim et al., 2023).

4. PDE, Entropic, and Time-Reversal Aspects

Fokker–Planck and Score PDEs: The time evolution of densities, and crucially of the score fields themselves, is governed by associated Fokker–Planck and score Fokker–Planck equations (score FPEs) (Lai et al., 2022, Liu et al., 8 Nov 2025).
- The score FPE encodes necessary self-consistency and gradient-structure across noise levels; regularizing the learning objective to enforce this PDE improves likelihood and conservativity (Lai et al., 2022).
- Li–Yau bounds and entropy methods provide $L^p$ -stability of the reverse-time equations and establish the rate of "support collapse" onto the data manifold, enabling a quantitative trade-off between imitation fidelity and generative diversity as stopping-time or viscosity is varied (Liu et al., 8 Nov 2025).
Entropy and Fisher Information: In the infinite-dimensional setting, Dirichlet form and log–Sobolev inequalities guarantee exponential decay of relative entropy and provide error bounds for finite-dimensional approximations and approximate scores (Greco, 19 May 2025).

5. Algorithmic Implementations and Extensions

Training Losses and Score Estimation:
- Denoising score-matching remains the standard training method in both finite and infinite dimensions, but alternative procedures arise via Malliavin calculus (regression of conditional expectations), kernel methods, or operator networks (Mirafzali et al., 21 Mar 2025, Mirafzali et al., 8 Jul 2025, Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023).
- Ensemble-based training-free score estimators for SPDEs offer scalable Bayesian filtering in high-dimensional scientific models (Huynh et al., 9 Aug 2025).
Sampling Algorithms: In both standard and extended settings, sampling proceeds via numerical SDE or ODE solvers for the time-reversed process, using trained or estimated score functions. Predictor–corrector, multilevel, and "churned" samplers ensure stability and diversity (Hagemann et al., 2023, Dey et al., 2024, Chase et al., 15 May 2025).
Applications:
- Image generation and super-resolution across discretization levels (Hagemann et al., 2023, Lim et al., 2023).
- Bayesian inverse problems (MRI, photoacoustic tomography, scientific imaging) with principled posterior uncertainty quantification (Dey et al., 2024, Chung et al., 2021, McCann et al., 2023, Feng et al., 2023).
- Adaptive scientific computing and SPDE filtering (Huynh et al., 9 Aug 2025).
- Extension to discrete state spaces using ratio-based score surrogates, enabling modeling of categorical and tokenized data (Sun et al., 2022).

6. Theoretical and Practical Considerations

Operator-Theoretic Guarantees: The infinite-dimensional Malliavin–Bismut formalism and the use of trace-class noise ensure well-posedness, geometric invariance, and consistent approximation across resolutions (Mirafzali et al., 27 Aug 2025, Greco, 19 May 2025, Hagemann et al., 2023).
Score Regularity and PDE Constraints: Regularization enforcing the score FPE, control of negative divergence, and calibration of forward/reverse SDE time endpoints balance sample diversity against over-imitation and ensure sharp stability bounds (Lai et al., 2022, Liu et al., 8 Nov 2025).
Computational Strategies: Exact analytic scores are feasible for moderate-dimensional data or Gaussian processes; high-dimensional applications rely on neural operator architectures, extensions of kernel regression, or multilevel/flexible parameterizations (Mirafzali et al., 21 Mar 2025, Hagemann et al., 2023, Mirafzali et al., 27 Aug 2025).
Limitations: High computational cost for exact score evaluation, necessity of trace-class noise for infinite-dimensional well-posedness, and sensitivity to score approximation error are key practical constraints (Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023).

7. Summary Table: Key Elements in Infinite-Dimensional Score-Based Diffusion

Aspect	Approach / Formula	Reference
Forward diffusion (SPDE)	$dX_t = A X_t\,dt + C^{1/2} dW_t$ in $H$	(Mirafzali et al., 27 Aug 2025)
Malliavin covariance operator	$\gamma_{X_t} = \int_0^t S(s)C(S(s))^* ds$	(Mirafzali et al., 27 Aug 2025)
Score (Fréchet derivative/log-density)	$\nabla \log p_t(x) = -\gamma_{X_t}^\dagger(x - S(t)x_0)$	(Mirafzali et al., 27 Aug 2025)
Operator-valued score parameterization	$s_\theta = C^{1/2} \Phi_\theta C^{-1/2}$	(Hagemann et al., 2023)
Sampling (reverse SDE in $H$ )	$dY_s = [A Y_s - C \nabla \log p_{t-s}(Y_s)] ds + \sqrt{2C} d\overline{W}_s$	(Mirafzali et al., 27 Aug 2025, Hagemann et al., 2023)
PDE/entropy bounds	$\frac{d}{dt} KL(\mu_t \\| m) \le -\frac12 KL(\mu_t \\| m)$ (Gross LSI)	(Greco, 19 May 2025)

References

(Hagemann et al., 2023) Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation
(Lim et al., 2023) Score-based Diffusion Models in Function Space
(Mirafzali et al., 27 Aug 2025) Score-Based Diffusion Models in Infinite Dimensions: A Malliavin Calculus Perspective
(Mirafzali et al., 21 Mar 2025) Malliavin Calculus for Score-based Diffusion Models
(Mirafzali et al., 8 Jul 2025) A Malliavin calculus approach to score functions in diffusion generative models
(Greco, 19 May 2025) A Malliavin-Gamma calculus approach to Score Based Diffusion Generative models for random fields
(Liu et al., 8 Nov 2025) A PDE Perspective on Generative Diffusion Models

Score-based diffusion thus defines a unified paradigm for principled generative modeling across finite and infinite dimensions, with rigorous analytical underpinnings provided by Malliavin calculus, operator theory, and the analysis of PDEs and SDEs. This enables not only high-quality sample generation but also flexible and well-calibrated inference for scientific, medical, and engineering data.