Infinite-Dimensional Score-Based Diffusion

Updated 29 December 2025

Infinite-dimensional score-based diffusion is a generative framework defined on separable Hilbert spaces where diffusion and score estimation naturally extend from finite to infinite dimensions.
It leverages SPDEs, Malliavin calculus, and neural operator techniques to ensure discretization invariance, geometric fidelity, and robust handling of function-valued data.
The approach achieves dimension-independent performance in tasks such as sampling, Bayesian inverse problems, and functional data analysis through rigorous probabilistic guarantees and operator learning.

Infinite-dimensional score-based diffusion refers to the formulation and analysis of diffusion-based generative models where the underlying data, noise processes, and learned score fields are defined directly on infinite-dimensional function spaces—typically separable Hilbert spaces—rather than on finite-dimensional vector spaces. The mathematical and computational frameworks developed in this context rigorously extend the expressive power and theoretical foundations of score-based models to function-valued data, random fields, and SPDE-governed processes, where discretization-invariance, geometric fidelity, and dimension-independent guarantees are paramount (Mirafzali et al., 27 Aug 2025, Pidstrigach et al., 2023, Franzese et al., 2023, Lim et al., 2023, Greco, 19 May 2025).

1. Mathematical Foundations: SPDEs, Reference Measures, and Score Operators

Infinite-dimensional score-based diffusion models operate on a separable real Hilbert space $H$ endowed with inner product $\langle \cdot, \cdot\rangle_H$ and norm $\|\cdot\|_H$ , with a fixed trace-class covariance operator $C$ corresponding to a centered Gaussian reference measure $\mu_0=\mathcal{N}(0,C)$ (Mirafzali et al., 27 Aug 2025, Pidstrigach et al., 2023). Diffusion (noising) is formulated as a linear SPDE: $du_t = A\,u_t\,dt + dW_t,\qquad u_0\in H,$ where $A:D(A)\subset H\to H$ generates a strongly continuous semigroup and $W_t$ is an $H$ -valued Gaussian process with covariance $C$ . The solution’s law at time $t$ is typically Gaussian with mean $S(t)u_0$ and covariance $\gamma_{u(t)} = \int_0^t S(s)C S(s)^*ds$ .

The density of the law of $u_t$ is defined with respect to the Gaussian reference measure, rather than a non-existent infinite-dimensional Lebesgue measure. The relevant score operator is the Fréchet derivative (gradient in the Cameron–Martin sense) of the log-density: $s_t(u) = \nabla^F \log p_t(u) = -\gamma_{u(t)}^\dagger(u - S(t)u_0),$ where $\gamma_{u(t)}^\dagger$ is the pseudoinverse acting on the Cameron–Martin space (Mirafzali et al., 27 Aug 2025).

Key analytic tools include infinite-dimensional Malliavin calculus, the Bismut–Elworthy–Li (BEL) formula, and Dirichlet form methods, which justify the representation and regularity of the score, as well as the existence and uniqueness of the associated SPDEs and SDEs (Mirafzali et al., 27 Aug 2025, Greco, 19 May 2025).

2. Forward and Backward Diffusions, Time-Reversal, and Conditional Laws

The forward “noising” process is defined as an infinite-dimensional OU-type SPDE or SDE, whose invariant measure is Gaussian (Lim et al., 2023, Pidstrigach et al., 2023): $dX_t = -\frac{1}{2}X_t\,dt + dW_t^U, \qquad X_0\sim\pi,$ where $W^U$ is a Cameron–Martin Wiener process. The Fokker–Planck equation for this flow ensures exponential convergence in law to the Gaussian reference.

Time-reversal yields the “denoising” or generative SDE: $dZ_t = \left(\frac{1}{2}Z_t + s(T-t, Z_t)\right)dt + d\widetilde W_t,$ where $s(t,x)$ is computed, in the Gaussian case, by conditional expectation: $s(t,x) = -(1-e^{-t})^{-1}\left(x - e^{-t/2}\mathbb{E}[X_0|X_t = x]\right).$ For conditional data modeling (e.g., Bayesian inverse problems), the score depends on observations $Y = A X_0 + B$ , and the drift correction integrates the infinite-dimensional conditional expectation (Baldassari et al., 2023): $S(t,x,y) = -(1-e^{-t})^{-1}\left(x - e^{-t/2}\,\mathbb{E}[X_0|X_t = x, Y = y]\right).$

In both unconditional and conditional settings, existence, uniqueness, and convergence of the reverse SDEs are established under mild regularity, and these equations retain meaning independent of the discretization scheme (Pidstrigach et al., 2023, Baldassari et al., 2023).

3. Score Estimation: Malliavin Calculus, Conditional Expectation, and Operator Learning

The central computational challenge is to estimate the infinite-dimensional score field (either unconditional or conditional), which is a mapping $H \to H$ (or $H \times \mathbb{R}^n \to H$ in the conditional case). There are several approaches:

Malliavin Calculus and BEL Formula: Provides a closed-form operator-theoretic expression for the Fréchet gradient of the log-density by leveraging the infinite-dimensional BEL formula. The result preserves the function space geometry and does not rely on finite-dimensional approximations (Mirafzali et al., 27 Aug 2025).
Dirichlet Form and Gamma Calculi: For general Gaussian measures and random fields, the score is shown to be precisely the Malliavin gradient of the log-density, and appears explicitly as a conditional expectation (Greco, 19 May 2025).
Operator Learning: Neural operators (e.g., Fourier neural operators, DeepONet) are used to approximate the score as a function in $H \to H$ , with training objectives formulated via empirical score-matching losses. This framework is discretization-invariant: networks trained on one grid generalize to any other mesh (Lim et al., 2023, Yang et al., 2024).
Conditional Denoising Estimators: For conditional settings necessary in inverse problems, the conditional score is learned by minimizing the denoising loss across samples of the forward SDE and observations, and is theoretically justified to yield the true conditional posterior when the score is exact (Baldassari et al., 2023).

4. Training Objectives, Theoretical Guarantees, and Discretization-Invariance

The standard infinite-dimensional score-matching loss takes the form: $\min_\theta\,\mathbb{E}\big\|s_\theta(x) - s(t, x)\big\|^2_H,$ where the ground-truth score is either available analytically (Gaussian case), as a conditional expectation, or approximated via the denoising objective.

Dimension-independent distance bounds are established for the Wasserstein-2 error between the generated and target measures. These bounds depend only on the score-matching risk and discretization error: $W_2(\pi, \mu_\text{sample}) \leq e^{-T/2}W_2(\pi, \mu) + \varepsilon_\text{Num}^{1/2} + a\,\varepsilon_\text{Loss}^{1/2},$ with all constants independent of dimensional truncation. Thus, resolution refinement does not deteriorate the generative model's performance, provided the infinite-dimensional structures are maintained (Pidstrigach et al., 2023, Yang et al., 2024, Greco, 19 May 2025).

A critical requirement is that the noise process has a trace-class covariance (e.g., Matérn, RBF kernels); white noise is often excluded as exact score training diverges and measure equivalence fails (Lim et al., 2023).

5. Practical Algorithms: Sampling, Bridge Simulation, and Neural Operator Parametrization

Sampling from infinite-dimensional diffusion models proceeds via Euler–Maruyama or predictor-corrector discretizations of the backward SDE on a chosen mesh or spectral truncation. The trained neural operator is evaluated on the discretized trajectory, enabling both unconditional generative sampling and conditional posterior sampling (for inverse problems) (Pidstrigach et al., 2023, Baldassari et al., 2024, Yang et al., 2024):

Discretization-Invariant Sampling: Mesh-agnostic neural operators (CT-UNO, FNO, etc.) preserve model fidelity across varying resolutions (Yang et al., 2024).
Infinite-dimensional Diffusion Bridges: Infinite-dimensional Doob h-transforms and path-space Girsanov theorems define the drift correction required for bridging sample paths, with operator-learning-based drift estimation (Yang et al., 2024).
Amortized Posterior Inference: In the Bayesian setting, after training the conditional score, posterior samples for new data incur only standard SDE integration costs, independent of the number of PDE solves required during training (Baldassari et al., 2023).

Empirical results across synthetic, PDE-driven, and image data confirm that these models produce high-quality, grid-agnostic samples and resolve posterior uncertainty in high-dimensional or function-valued inverse problems (Lim et al., 2023, Franzese et al., 2023, Yang et al., 2024, Baldassari et al., 2024).

6. Connections to Variational Inference, Normalizing Flows, and Theoretical Limits

Infinite-dimensional score-based diffusion encompasses a continuous-time ELBO framework via infinite-dimensional Girsanov’s theorem, establishing a precise equivalence between score matching and optimizing a likelihood lower bound (the path-space ELBO). In the zero-noise limit, these models degenerate to continuous-time normalizing flows (neural ODEs) (Huang et al., 2021, Franzese et al., 2023).

Explicit characterization of the training and approximation error propagation to sample quality, as well as the convergence rates in KL or Wasserstein distances, are provided in recent analyses (Baldassari et al., 2024, Greco, 19 May 2025). Singularities may arise in the conditional score for small diffusion times, especially with noiseless observations, requiring regularization or adaptive schemes.

7. Applications, Limitations, and Future Directions

Infinite-dimensional score-based diffusion has found applications in

dimensionality-invariant image and shape generation,
Bayesian inference and uncertainty quantification in PDE-constrained inverse problems,
probabilistic functional data analysis,
simulation of conditioned stochastic shape flows and diffusion bridges in evolutionary biology, and
modeling of random fields and spatial statistics (Lim et al., 2023, Franzese et al., 2023, Baker et al., 2024, Yang et al., 2024, Baldassari et al., 2024).

Current limitations include:

computational cost and complexity of neural operators at massive resolutions,
the need for careful regularization for small diffusion times or singular data,
potential intrinsic limitations when learning scores outside the Cameron–Martin space (especially for white noise structures).

A major ongoing direction is the extension of the theory and architectures to non-Gaussian reference processes, non-linear SPDEs, and stronger generalization in out-of-distribution functional settings (Mirafzali et al., 27 Aug 2025, Yang et al., 2024).

References:

(Mirafzali et al., 27 Aug 2025, Greco, 19 May 2025, Pidstrigach et al., 2023, Franzese et al., 2023, Lim et al., 2023, Yang et al., 2024, Baldassari et al., 2024, Baldassari et al., 2023, Baker et al., 2024, Huang et al., 2021)