Multi-Dimensional GP Regression

Updated 3 August 2025

MD-GPR is a probabilistic framework extending classic Gaussian processes to handle high-dimensional inputs and outputs with adaptive covariance structures.
It leverages adaptive neural architectures, multiple kernel learning, and tensor-valued methods to model nonlinear correlations and reduce dimensionality.
Scalable techniques like Kronecker and HODLR approximations ensure efficient uncertainty quantification and enhanced prediction accuracy in complex applications.

Multi-Dimensional Gaussian Process Regression (MD-GPR) is a probabilistic framework that extends classical Gaussian process models to address regression problems involving high-dimensional input and/or output spaces, often incorporating structured dependencies, multiple tasks, or complex functional forms. MD-GPR encompasses a diverse set of model architectures, inference strategies, covariance structures, and computational methods designed to address the curse of dimensionality, capture correlations, and achieve computational scalability. Below, the principal methodologies and developments in this domain are organized according to foundational aspects, advanced model architectures, scalable computation, dimensionality reduction, and practical applications.

1. Foundational Model Structures

Adaptive Network Architectures

The Gaussian Process Regression Network (GPRN) framework represents an overview of Bayesian neural architectures and nonparametric GP priors (Wilson et al., 2011). The basic model is

$y(x) = W(x)[f(x) + \sigma_f\varepsilon] + \sigma_y z,$

where the node functions $f(x)$ and adaptive weight functions $W(x)$ are independent GPs. This construction allows each output $y_i(x)$ to exhibit an input-dependent covariance:

$k_{y_i}(x,x') = \sum_{j=1}^{q} W_{ij}(x) [k_{f_j}(x,x') + \sigma_f^2 \delta_{xx'}] W_{ij}(x') + \sigma_y^2,$

which encodes input-dependent amplitude, length-scales, and noise correlations. The model supports infinite-dimensional mixing for deep, flexibly coupled multi-output regression and multivariate volatility modeling, with demonstrated performance improvements in both SMSE and likelihood-based metrics over various contenders.

Multiple Kernel Learning for Input Structure

Multiple Gaussian Process (MGP) models (Archambeau et al., 2011) approach high-dimensional regression via convex combinations of kernel matrices, where each kernel captures potentially distinct input feature sets. The hierarchical Bayesian construction

$y | \gamma \sim \mathcal{N}(0, \sum_p \gamma_p^{-1} K_p + \tau^{-1} I)$

with generalized inverse Gaussian (GIG) priors on the kernel weights enforces both sparsity and robustness, providing a flexible setting for selecting among candidate input structures.

Additive and Structured Covariance Models

Additivity and structured covariance play pivotal roles in MD-GPR. Models such as additive GPs, projection pursuit GPs (PPGPR), and Kronecker-structured covariance models (Gilboa et al., 2012, Semochkina et al., 14 Feb 2025) leverage separability and dimension-specific modeling to enable scalable regression with interpretable dependencies between inputs and outputs.

2. Advanced Multi-Dimensional Architectures

Tensor-Valued and Multi-Output GPs

The tensor-variate GP (TvGP) framework (Semochkina et al., 14 Feb 2025) generalizes GP regression to $m$ -way tensor outputs with Kronecker-separable covariance,

$\mathrm{Cov}(F(x), F(x')) = \kappa(x, x') (\Sigma_1 \otimes \cdots \otimes \Sigma_m).$

This approach supports architectures such as:

Outer Product Emulators (OPE), imposing separability in both mean and covariance, appropriate for outputs with strong spatial/temporal correlation.
Parallel Partial Emulators (PPE), modeling outputs as independent GPs, efficient for unstructured outputs.

Applicability is enhanced for spatial-temporal simulators and multi-task learning, where the preservation of natural tensor structure yields both computational savings and improved uncertainty quantification.

Autoregressive and Deep Models

The Gaussian Process Autoregressive Regression (GPAR) framework (Requeima et al., 2018) models joint multi-output regression via a chain-rule decomposition,

$p(y_1,\ldots,y_D|x) = p(y_1|x) \cdot p(y_2|x,y_1) \cdots p(y_D|x,y_1,\ldots,y_{D-1}),$

allowing each conditional to be a flexible GP. This structure efficiently captures nonlinear, input-dependent dependencies between outputs while retaining scalability via standard single-output inference techniques and inducing-point methods.

3. Scalable Computation for High-Dimensional Data

Method	Key Principle	Complexity / Scalability Features
Additive GPs via State-Space Models	Per-dimension GPs and backfitting	Linear in $N$ (Gilboa et al., 2012)
Kronecker Structured GPs	Product kernels and Kronecker algebra	$O(D m^{1+1/D})$ for $D$ -dimensional grids
Hierarchical Off-diagonal Low-Rank	HODLR decomposition of covariance matrix	$O(n \log^2 n)$ , amenable to distributed use
Split GPs for Streaming Data	Recursive partition, local GPs with PCA	Memory $O(m n)$ , update time $O(m^3)$
Randomly Projected Additive GPs	Sum of random 1D kernel projections	Near-linear via SKI (Delbridge et al., 2019)

Projection pursuit and dimension-expansion models further alleviate the curse of dimensionality:

By expanding inputs to a higher-dimensional set of learned projections, the function is expressed as an additive sum of univariate GPs (Chen et al., 2020), formally,

$y(x) = f_1(w_1^T x) + \cdots + f_M(w_M^T x),$

with $M \gg d$ learned via likelihood-based gradient descent.

Random projections or deterministic spread (e.g., via orthogonalization) converge the combined kernel to standard high-dimensional kernels (RBF/IMQ) as the number of projections increases, ensuring both expressivity and rapid inference (Delbridge et al., 2019).

Factorization and Kronecker-structured methods (Lyu et al., 12 Sep 2024, Semochkina et al., 14 Feb 2025) provide scalable and distributed implementations, with the Kronecker structure leveraging separable kernels for large, grid-structured data and the HODLR approach enabling recursive low-rank approximations.

4. Dimensionality Reduction and Manifold Structure

Active Subspaces and Additivity

Combining additivity with active subspaces (AS) (Binois et al., 6 Feb 2024) provides a hybrid structure:

$Y_E(x) = \rho Y_C(x) + \delta(Ax), \quad Y_C(x) \perp \delta(Ax),$

where $Y_C(x)$ is an additive coarse-level GP and $\delta(Ax)$ is a GP on the learned linear subspace. AS is identified via gradient covariance eigen-decomposition, capturing main effects additively and high-order interactions in a low-dimensional subspace.

Manifold Gaussian Processes

Recent methodology infers implicit low-dimensional manifolds directly from data without explicit parametrization (Fichera et al., 2023). Approaches construct differentiable graph Laplacians converging to the Laplace–Beltrami operator, defining kernels over the latent geometry. Fully differentiable end-to-end inference enables joint optimization of the manifold structure and GP hyperparameters by backpropagation, scaling to hundreds of thousands of data points.

Theoretical guarantees demonstrate convergence, under increasing data density, to a Matérn Gaussian Process defined on the manifold, ensuring predictive calibration and robustness to high-dimensional noise.

Multi-Fidelity Dimension Reduction

Rotated Multi-fidelity GP models (RMFGP) (Zhang et al., 2022) use dimension reduction (e.g., via SAVE) on low-fidelity data to guide multi-fidelity GP surrogates and subsequent supervised projection, iteratively optimizing rotation matrices and employing Bayesian active learning for high-fidelity sample selection:

$M = \frac{1}{H} \sum_{h=1}^H E_n[I(y \in J_h)](I_p - \mathrm{Var}_n(Z|y \in J_h))^2,$

with eigenvectors yielding the central subspace. RMFGP enables uncertainty quantification and robust surrogate modeling even when high-fidelity data is scarce.

5. Advanced Covariance Structures

Multiscale and Multilevel Covariances

For multiscale data assimilation (Barajas-Solano et al., 2018), bivariate or multivariate Matérn covariance kernels capture dependencies between fine and coarse spatial scales (e.g.,

$C(\mathbf{r}) = \begin{pmatrix} \sigma_c^2 M(\mathbf{r}| \nu_c, \lambda_c^{-2} I_d) + \sigma_{nc}^2 \mathbf{1}_{\|\mathbf{r}\|=0} & \rho \sigma_c \sigma_f M(\mathbf{r}|\nu_{cf}, \lambda_{cf}^{-2} I_d) \ \rho \sigma_c \sigma_f M(\mathbf{r}| \nu_{cf}, \lambda_{cf}^{-2} I_d) & \sigma_f^2 M(\mathbf{r}|\nu_f, \lambda_f^{-2} I_d) + \sigma_{nf}^2 \mathbf{1}_{\|\mathbf{r}\|=0} \end{pmatrix}$

), where cross-covariances and scale-specific smoothness are estimated directly from data via pseudo-likelihood or LOO-CV.

Generalized GPs for Non-Gaussian Functional Data

Extensions for non-Gaussian functional regression (Wang et al., 2014) integrate exponential family likelihoods and concurrent modeling of scalar and functional covariates, with the latent process governed by a multi-dimensional kernel (e.g., squared exponential with dimension-specific weights or blended with linear terms), estimated by Laplace approximation for empirical Bayes inference.

Distribution-Based GPs via Optimal Transport

Gaussian processes indexed by multidimensional distributions (Bachoc et al., 2018) are constructed by embedding each probability measure μ using its optimal transport map to a Wasserstein barycenter $\bar{\mu}$ in $L^2(\bar{\mu})$ , thus enabling the construction of strictly positive definite Hilbertian radial kernels:

$K(\mu, \nu) = F(\|T_\mu^{-1} - T_\nu^{-1}\|_{L^2(\bar{\mu})}),$

with theoretical guarantees for consistency and microergodicity in infinite-dimensional settings.

6. Applications and Empirical Findings

MD-GPR methods are deployed across domains:

Multivariate geostatistics (e.g. spatial-temporal environmental data, heavy metal concentrations) exploit tensor and multi-task structures (Wilson et al., 2011, Semochkina et al., 14 Feb 2025).
Surrogate modeling in computational engineering and climate simulation benefits from scalability and input/output structuring (Chen et al., 2020, Barajas-Solano et al., 2018, Hoffmann et al., 19 Jun 2024).
Functional data analysis employs multi-level GP models for joint inference on common trends and individual deviations, with expedited analytic likelihoods for regular and partially regular sampling (Hoffmann et al., 19 Jun 2024).
Manifold and dimension reduction methods enhance performance in robotics, geostatistics, imaging, and uncertainty quantification, especially where intrinsic dimensionalities are lower than ambient (Fichera et al., 2023, Zhang et al., 2022).
Multi-fidelity and streaming data contexts leverage structure and partitioning for data-efficient, real-time estimation (Terry et al., 2020, Binois et al., 6 Feb 2024).

Performance evaluations across studies show that advanced MD-GPR methods (e.g., GPRN, PPGPR, TvGP, OPE) consistently outperform simpler or unstructured GP baselines, particularly in high-dimensional or correlated data regimes, with significant reductions in mean prediction error, computational time, and improved uncertainty calibration.

7. Summary and Outlook

MD-GPR encompasses a suite of architectural, computational, and statistical tools that extend the applicability of Gaussian processes to high-dimensional, structured, and computationally challenging regression problems. By leveraging network-inspired structures, active subspaces, additivity, manifold learning, and scalable covariance representations (Kronecker, HODLR), these models overcome many classical limitations, enabling efficient, interpretable, and uncertainty-aware learning. Nevertheless, ongoing research is directed at the further integration of supervised and unsupervised dimension reduction, automatic selection of subspace and multi-fidelity structures, uncertainty-aware active learning, and the development of uncertainty quantification protocols that scale to massive, noisy, and heterogeneously structured scientific datasets.