2000 character limit reached

Function-on-Function Gaussian Process (FFGP)

Updated 18 November 2025

FFGP is a framework for modeling mappings between infinite-dimensional function spaces using operator-valued kernels and Hilbert space representations.
It employs nonparametric Bayesian regression and efficient eigendecomposition to enable accurate operator learning without discretization approximations.
FFGP extends classical Gaussian processes to functional inputs and outputs, offering enhanced uncertainty quantification and scalable computation for complex systems.

A function-on-function Gaussian process (FFGP) is a mathematical framework for modeling mappings where both the input and the output reside in infinite-dimensional function spaces. The FFGP formalism enables nonparametric Bayesian regression, operator learning, and uncertainty quantification in diverse fields, including functional data analysis, operator learning for partial differential equations, and Bayesian optimization in complex system design. FFGPs directly model the joint distribution over functions, enabling efficient and flexible inference without discretization or basis-expansion approximations that are traditionally required for functional inputs or outputs.

1. Definition and Mathematical Framework

An FFGP models a mapping $f: \mathcal{X}^p \rightarrow \mathcal{Y}$ , where $\mathcal{X}^p = \mathcal{X} \times \cdots \times \mathcal{X}$ and $\mathcal{X} \subset L^2(\Omega_x)$ for compact $\Omega_x \subset \mathbb{R}^d$ , and $\mathcal{Y} = L^2(\Omega_y)$ for compact $\Omega_y$ (often $[0,1]$ ). Both spaces are Hilbert spaces with $L^2$ inner product structure. The FFGP is characterized by specifying a mean function $\mu \in \mathcal{Y}$ and a positive-definite operator-valued kernel $K: \mathcal{X}^p \times \mathcal{X}^p \to \mathcal{L}(\mathcal{Y})$ , leading to the Gaussian process prior

$f(\cdot) \sim \operatorname{FFGP}(\mu, K(\cdot, \cdot)).$

For any finite collection of inputs $x_1, \ldots, x_n \in \mathcal{X}^p$ , the outputs $[f(x_i)]_{i=1}^n$ jointly follow a (functional) Gaussian law in $(L^2)^n$ , with mean $[\mu, \ldots, \mu]$ and blockwise covariance $K(x_i, x_j)$ (Huang et al., 16 Nov 2025).

2. Operator-Valued Kernels in FFGPs

FFGPs use operator-valued kernels to encode dependencies between function-valued inputs and outputs. The standard construction is the separable operator-valued kernel: $K(x, x') = \sigma^2\, k_x(x, x')\, T_{\mathcal{Y}},$ where $k_x$ is a positive-definite scalar kernel on $\mathcal{X}^p \times \mathcal{X}^p$ , typically constructed via the $L^2$ -distance $r(x, x') = \| (x - x') / \psi_x \|_{L^2}$ , and incorporating the Matérn- $\nu$ kernel for smoothness control. $T_{\mathcal{Y}} \in \mathcal{L}(\mathcal{Y})$ is a nonnegative self-adjoint operator, often a Hilbert–Schmidt integral operator with a kernel such as $k_y(s, t) = \exp(-|s-t|/\psi_y)$ or the Wiener kernel $k_y(s, t) = \min(s,t)/\psi_y$ (Huang et al., 16 Nov 2025).

Under this construction, the covariance between $f(x)(t)$ and $f(x')(t')$ decomposes as

$\mathrm{Cov}(f(x)(t), f(x')(t')) = \sigma^2 k_x(x, x') k_y(t, t').$

3. Posterior Inference and Predictive Distributions

Given observations $y_i = f(x_i) + \varepsilon_i$ with $\varepsilon_i \sim \mathcal{N}(0, \tau^2 I_{\mathcal{Y}})$ in $\mathcal{Y}$ , posterior inference exploits eigendecomposition of $T_{\mathcal{Y}}$ and the Gram matrix of $k_x$ . For a new input $x$ , the posterior mean and covariance operator are

$\begin{aligned} \hat{f}(x) &= \mu + K_n(x)^\top \left( \mathcal{K}_n + \tau^2 I_{\mathcal{Y}} \right)^{-1} (Y_n - 1_n \mu), \ \widehat{K}(x, x) &= K(x, x) - K_n(x)^\top \left( \mathcal{K}_n + \tau^2 I_{\mathcal{Y}} \right)^{-1} K_n(x), \end{aligned}$

where $\mathcal{K}_n = [K(x_i, x_j)]_{i,j}$ , $K_n(x) = [K(x_i, x)]_{i=1}^n$ , and $Y_n$ stacks all observed functions. Series expressions are derived using the eigendecomposition $T_{\mathcal{Y}} v_i = \beta_i v_i$ , truncating when $\sum_{i=1}^m \beta_i$ captures $>90\%$ of the trace of $T_{\mathcal{Y}}$ (Huang et al., 16 Nov 2025).

4. Computational Complexity and Scalability

Training FFGP models centers on the eigendecomposition of the $n \times n$ Gram matrix (complexity $O(n^\omega)$ , with $2 < \omega < 2.376$ ) and spectral representation of $T_{\mathcal{Y}}$ . Each log-likelihood gradient evaluation costs $O(N_{mc} n^2 m + (p+3) n m)$ , where $N_{mc}$ is the functional norm computation cost, $m$ the retained eigenspectrum rank. Prediction at a new input, post-truncation, requires $O(m n + m^2)$ (Huang et al., 16 Nov 2025).

Scalable extensions include:

Variational inducing-point methods and whitening transformations for deep FFGP architectures (Lowery et al., 24 Oct 2025).
Low-rank plus diagonal representations in neural-operator-based FFGPs (Magnani et al., 7 Jun 2024).

FFGP extends classical GP regression to function-valued input–output mappings in a mathematically consistent way:

Multi-output GPs/matrix-valued kernels (e.g., Conti–O’Hagan, Bonilla et al.) handle vector outputs via discretization or fixed basis but cannot natively handle infinite-dimensional function outputs.
Functional-input Bayesian optimization (FIBO) targets function-to-scalar mappings in RKHS but does not support functional outputs.
FOBO models (functional output Bayesian optimization) address scalar or vector inputs to functional outputs via FPCA discretization, introducing discretization error and potentially losing accuracy on irregular grids.
The FFGP achieves full infinite-dimensional modeling without pre-discretization—enabling accurate operator learning and uncertainty quantification (Huang et al., 16 Nov 2025, Lowery et al., 24 Oct 2025).

6. Modern Architectures and Extensions

Several architectures build upon and generalize the FFGP concept:

Deep Gaussian Processes for Functional Maps (DGPFM) stack layers of GP-based integral transforms and nonlinear GP activations to model highly nonlinear function-on-function maps. Discrete approximations of kernel integral transforms collapse to direct functional transforms, enabling scalable inference and uncertainty quantification. Empirically, DGPFM outperforms Bayesian neural operators and FNO-based architectures in predictive accuracy and uncertainty calibration on PDE and real-world datasets (Lowery et al., 24 Oct 2025).
Linearization-based function-valued GPs for neural operators construct a Laplace-approximated Bayesian posterior in neural operator weight space, propagate it via first-order Taylor expansion, and "curry" the joint GP over input-function/evaluation pairs into a function-on-function GP. Resolution-agnostic, efficient sampling is achieved via the spectral representation of the neural operator, with closed-form predictions for entire output functions (Magnani et al., 7 Jun 2024).

7. Implementation, Practicalities, and Applications

The FFGP paradigm underlies a range of practical frameworks:

The GPFDA package implements GP-based function-on-function regression, including both concurrent and historical mean structures, flexible kernel specification (separable, tensor-product, non-separable), and closed-form prediction. Predictions are available both when part of a new response curve is observed (Type I) and when entirely new functional covariates are supplied (Type II). GPFDA also leverages marginal likelihood for hyperparameter selection and supports additive and nonstationary kernels (Konzen et al., 2021).
FFGP-based surrogates are used in function-on-function Bayesian optimization (FFBO), where UCB-style acquisition functions use operator-weighted scalarizations, and scalable function-space gradient ascent algorithms search for optimal input functions. Theoretical guarantees include well-posedness of the posterior, decaying truncation error $O(m^{-1})$ with increasing eigendomain truncation, and high-probability sublinear regret in Bayesian optimization (Huang et al., 16 Nov 2025).

FFGP models are deployed in applications involving functional data on irregular grids, spatiotemporal operator learning, and design optimization under complex constraints—offering significant improvements in data efficiency and uncertainty quantification relative to discretization-based GP surrogates, neural operators, or functional regression methods.

References:

"Function-on-Function Bayesian Optimization" (Huang et al., 16 Nov 2025)
"Linearization Turns Neural Operators into Function-Valued Gaussian Processes" (Magnani et al., 7 Jun 2024)
"Deep Gaussian Processes for Functional Maps" (Lowery et al., 24 Oct 2025)
"Gaussian Process for Functional Data Analysis: The GPFDA Package for R" (Konzen et al., 2021)