Gaussian Process Operator Learning

Updated 31 December 2025

Gaussian Process Operator Learning is a probabilistic framework that places GP priors on operators mapping between infinite-dimensional function spaces while integrating neural operator architectures.
It employs operator-valued kernels and techniques like currying and bilinear formulations to capture complex dependencies in computational physics and PDE modeling.
The framework delivers rigorous uncertainty quantification and scalable inference using sparse approximations, ensuring accurate predictions and reliable model calibration.

Gaussian Process Operator Learning is a family of probabilistic frameworks that leverage Gaussian process (GP) priors for learning mappings (operators) between infinite-dimensional function spaces. The methodology enables rigorous uncertainty quantification, nonparametric representational flexibility, and integration with neural operator architectures. It has emerged as an interpretably robust and theoretically grounded alternative and complement to deterministic deep operator learning, particularly in computational physics, scientific computing, and system identification.

1. Mathematical Foundations of GP Operator Learning

Gaussian Process Operator Learning generalizes Gaussian process regression from finite-dimensional function prediction to learning operators $\mathcal{G} : \mathcal{U} \to \mathcal{V}$ , where $\mathcal{U}, \mathcal{V}$ are spaces of functions. The central formalism places a GP prior on the operator:

$\mathcal{G} \sim \mathcal{GP}(m[\cdot], k(\cdot, \cdot)),$

where $m[\cdot] : \mathcal{U} \to \mathcal{V}$ encodes operator mean (possibly via a neural operator) and $k(\cdot, \cdot)$ is an operator-valued covariance (kernel) on function space.

Key constructions include:

Currying & Bilinear Formulation: Instead of representing $\mathcal{G}(u)$ directly, various frameworks parameterize the associated bilinear form $\widetilde{\mathcal{G}}(u, \varphi) = [\varphi, \mathcal{G}(u)]$ , as in (Mora et al., 2024).
Lifting & Kernel Embedding: Input functions are endowed with latent feature maps (usually learned via neural operators) before Gaussian process regression on the embedded space (Kumar et al., 2024, Kumar et al., 18 Jun 2025).
Operator-valued Kernels & Mercer Decomposition: Learning can exploit operator-valued kernels (OVKs) and their spectral decompositions for both theoretical analysis and efficient computation (Nelsen et al., 2024).

For parametric PDEs and time-dependent systems, the operator learning task seeks an approximation such that $u \approx \mathcal{G}(a)$ for $a \in \mathcal{U}$ and $u \in \mathcal{V}$ , where $a$ encodes input functions (e.g. coefficients, initial/boundary data) and $u$ encodes PDE solutions.

2. GP Priors, Mean Functions, and Kernel Choices

The core GP operator model is defined via its mean $m(\cdot)$ and covariance $k(\cdot, \cdot)$ :

Mean Function Choices: Either a zero mean (pure GP, matches nonparametric regression) or a neural operator mean (DeepONet, FNO, WNO, etc.), yielding hybrid models where the GP acts as a nonparametric uncertainty-corrector atop a deterministic deep surrogate (Mora et al., 2024, Kumar et al., 2024, Kumar et al., 18 Jun 2025). This is critical for capturing smooth trends and long-range dependencies lost with sparse/approximate kernel structures.
Covariance Structure: The kernel $k(\cdot, \cdot)$ $k (\cdot, \cdot)$ is often factored in terms of input, output, and location components:
- Separable Product Kernels: $k((a,x),(a',x')) = k_a(a, a') k_x(x, x')$ exploits spatial and parametric structure, admitting Kronecker factorization for tractability (Kumar et al., 18 Jun 2025, Kumar et al., 2024).
- Latent Neural Operator Embedding: Inputs are mapped via $\Phi_{\theta}$ to latent spaces, with GP kernels applied to embedded codes (Kumar et al., 2024, Kumar et al., 18 Jun 2025). Enables resolution independence and scalability.
- Operator-valued and Function-valued Kernels: For operator learning, the target is a random function or operator, requiring operator-valued or function-valued kernels (Nelsen et al., 2024, Souza et al., 19 Oct 2025).

For probabilistic Koopman operator discovery, a GP prior is placed on the lifting (observable) functions, and the operator is learned jointly with GP hyperparameters and virtual targets (Majumdar et al., 1 Apr 2025).

3. Optimization, Scalability, and Inference Algorithms

Training and inference in GP operator learning are subject to complexity constraints due to covariance matrix inversion and hyperparameter optimization.

Marginal Likelihood Optimization: Model fitting typically involves joint optimization of neural operator parameters, GP kernel hyperparameters, and (optionally) operator coefficients via marginal likelihood or evidence maximization (Kumar et al., 2024, Kumar et al., 18 Jun 2025).
Auto-differentiation and Nested Optimization: Certain workflows (e.g., iGPK (Majumdar et al., 1 Apr 2025)) optimize over virtual targets and kernel parameters through nested automatic differentiation, embedding the full operator regression into a single differentiable graph.
Sparse/Inducing-Point Approximation: To address the $O(N^3)$ scaling of dense GP regression, inducing point methods (SVGP), Kronecker factorizations (spatial-product kernels), and local nearest-neighbor kernel sparsification are employed for tractable inference on large discretizations (Kumar et al., 18 Jun 2025, Kumar et al., 2024).
Stochastic Dual Descent (SDD): Dual optimization over representer weights and latent parameters enables linear scaling in dataset size, maintaining computational tractability and supporting resolution independence (Kumar et al., 2024).

A pseudocode sketch for scalable GPO training incorporates sparse kernels, neural operator embeddings, variational elbo optimization, and expressive mean functions (Kumar et al., 18 Jun 2025):

for epoch in range(T):
    for minibatch in batches:
        compute mean via WNO
        embed inputs with neural operator
        build local spatial kernel (KNN)
        assemble feature kernel with inducing points
        evaluate ELBO on batch
        backprop gradients and update parameters

4. Uncertainty Quantification and Propagation

The predictive posterior of GP operator models provides principled uncertainty bounds:

Analytic Covariance Propagation: For linear models, predictive mean and covariance propagate in closed form, e.g., $\hat z_{k+1} = K \hat z_k$ , $\hat{\mathcal V}_{k+1} = K\hat{\mathcal V}_k K^\top$ (Majumdar et al., 1 Apr 2025).
Epistemic and Aleatoric Components: Total uncertainty decomposes into epistemic (model/hyperparameter) and aleatoric (observational noise) terms (Kumar et al., 2024).
Calibration of Confidence Intervals: Empirical studies report tight 95% credible bands, which contract with increasing data density and correctly encapsulate true dynamics even under measurement noise and resolution mismatches (Kumar et al., 18 Jun 2025, Kumar et al., 2024, Majumdar et al., 1 Apr 2025).
Bayesian Operator Inference: GP smoothing of state and derivative estimates, followed by Bayesian regression, yields reduced operators with propagated uncertainty—allowing analytic credible intervals for system surrogate predictions (McQuarrie et al., 2024).

5. Connections to Operator-valued Kernels, Deep Neural Operators, and Infinite-width Limits

Operator/Function-valued Gaussian Processes: GP operator learning is deeply linked to operator-valued kernel ridge regression; model architectures may leverage random feature expansions for approximation with theoretical error guarantees (Nelsen et al., 2024).
Infinite Neural Operator (NO) Limit: Arbitrary-depth neural operators with Gaussian-initialized convolution kernels converge, in the infinite-width limit, to function-valued GPs. These NO-GP kernels encode architectural inductive biases—spectral selectivity (band-limited FNO), smoothness (Matern), and nonlinear compositionality (Souza et al., 19 Oct 2025).
Dual Formulation and Physics-informed Extensions: Modelling the operator's bilinear form (rather than function values directly) facilitates Kronecker-efficient training and leverages physics-informed mean-/kernel-regularization for enhanced sample efficiency and accuracy (Mora et al., 2024).

6. Empirical Performance and Practical Applications

Gaussian Process Operator Learning has demonstrated:

High Accuracy and Robustness: On parametric PDE benchmarks (e.g., Burgers, wave-advection, Darcy flow, Navier-Stokes), scalable GPO and hybrid GP-NO frameworks achieve sub-2% relative $L^2$ error, match or surpass deterministic neural operators (WNO, FNO), and maintain predictive bands that remain well-calibrated under spatial super-resolution and noisy data (Kumar et al., 18 Jun 2025, Kumar et al., 2024, Kumar et al., 2024, Majumdar et al., 1 Apr 2025, Mora et al., 2024).
Nonlinear Operator Discovery: Nonparametric Volterra Kernels Model (NVKM) provides a scalable, GP-based route to Bayesian learning of nonlinear operators with explicit path sampling, outperforming latent force and recurrent GP baselines in multi-output system identification and meteorological data regression (Ross et al., 2021).
Functional PDEs and Physics Applications: Functional GP operator learning enables solution of renormalization group equations (Wetterich, Wilson–Polchinski), supports nonconstant field surrogates, and offers improved accuracy over local-potential approximations in lattice field theory (Yang et al., 24 Dec 2025).
Probabilistic Reduced-order Modeling: Bayesian-GP operator inference yields stability-regularized, uncertainty-aware surrogates for time-dependent nonlinear systems, compressible flows, and epidemiological ODE compartment models (McQuarrie et al., 2024).

Table: Comparative Accuracy (Relative $L^2$ Error, %) | Framework | Burgers | Wave-Adv. | Darcy (tri) | N–Stokes | |-------------------|---------|-----------|-------------|----------| | LoGoS-GPO | 0.86 | 0.43 | 1.38 | 2.01 | | GPO (std) | 2.89 | 0.63 | 2.18 | 2.21 | | SVGP | 3.81 | 1.81 | 5.18 | 7.89 | | WNO (determin.) | 4.09 | 1.01 | 2.21 | 12.23 |

7. Scalability, Resolution Independence, and Future Directions

Typical challenges addressed by recent developments:

Scalable Inference: Sparse spatial kernels (KNN), inducing-point approximations, and Kronecker factorization reduce memory and runtime, yielding near-linear scaling with grid size ( $O(d\,K)$ ) and batch size ( $O(NB)$ ) (Kumar et al., 18 Jun 2025, Kumar et al., 2024).
Resolution Independence: Neural operator-embedded kernels allow GP evaluation over arbitrary discretizations, supporting zero-shot super-resolution and cross-mesh inference (Kumar et al., 2024, Kumar et al., 18 Jun 2025).
Extensions: Future directions include hybridizing deep kernel learning, adaptive kernel selection, multi-fidelity and hierarchical GPs, operator-valued NTK analysis, and physics-informed regularization (Souza et al., 19 Oct 2025, Mora et al., 2024, Yang et al., 24 Dec 2025).

Gaussian Process Operator Learning provides a mathematically rigorous, uncertainty-aware, and scalable foundation for learning complex operator mappings in scientific computing, physical modeling, system identification, and beyond. Its integration with neural architectures and operator-valued kernel methods continues to expand its reach in high-dimensional, data-intensive, and resolution-varying domains.