Multioutput Gaussian Processes

Updated 2 January 2026

Multioutput Gaussian Processes are nonparametric frameworks that model vector-valued functions by jointly capturing correlations among outputs in tensor decomposition.
They employ hierarchical shrinkage priors to enable automatic rank selection and prune inactive components for efficient modeling.
Variational inference with closed-form updates ensures scalable learning and robust performance in tensor completion and factorization tasks.

A multioutput Gaussian process (MOGP) is a nonparametric probabilistic framework for modeling vector-valued functions, allowing flexible and joint treatment of correlated outputs or tensor-valued signals over continuous index sets. Within functional tensor decomposition and completion, MOGPs serve as priors over latent factor functions, encoding both intra-factor smoothness and inter-factor dependencies, supporting scalable inference, automatic rank selection, and functional universality.

1. Mathematical Foundations of Multioutput GPs in Tensor Decomposition

Consider a $K$ -mode functional tensor $\mathcal{Y}(x^{(1)}, \dots, x^{(K)})$ , defined over $\mathcal{X}^{(1)} \times \cdots \times \mathcal{X}^{(K)}$ , each $\mathcal{X}^{(k)} \subset \mathbb{R}$ . The canonical polyadic (CP)-type decomposition for the noise-free field is

$x_{\boldsymbol i} = \sum_{r=1}^{R} \prod_{k=1}^{K} u_r^{(k)}(i_k)$

where each $u_r^{(k)} : \mathcal{X}^{(k)} \to \mathbb{R}$ is a latent function. For parsimonious, flexible modeling of $\mathbf{U}^{(k)}(\cdot) = [u_1^{(k)}(\cdot), \ldots, u_R^{(k)}(\cdot)]$ , an $R$ -output vector of functions, the multioutput Gaussian process prior is

$\mathbf{U}^{(k)}(\cdot) \sim \mathcal{MGP}(\mathbf{0}, \varsigma_k(\cdot, \cdot), \Gamma^{-1})$

with a positive definite kernel $\varsigma_k$ capturing smoothness over $\mathcal{X}^{(k)}$ , and row covariance $\Gamma^{-1} = \mathrm{diag}(\gamma_1^{-1}, \ldots, \gamma_R^{-1})$ controlling power across CP components. The observed data $y_n$ are then

$y_n \mid \{u_r^{(k)}\}, \tau \sim \mathcal{N}\left( \sum_{r=1}^R \prod_{k=1}^K u_r^{(k)}(i_k^{n}), \tau^{-1} \right)$

Each MOGP thus defines a prior over multivariate, mode-specific latent factors, enabling the application of the full machinery of Gaussian process-based function learning for both discrete and continuous-domain tensor settings (Li et al., 25 Dec 2025).

2. Hierarchical Shrinkage Priors and Automatic Rank Determination

Extending basic MOGPs, a hierarchical Bayesian scheme is employed for rank learning. Discretizing $\mathbf{U}^{(k)}(\cdot)$ at grids $\mathcal{S}_k$ , let $U^{(k)} \in \mathbb{R}^{N_k \times R}$ be the sampled factor matrix. The prior is:

$p\left(\{U^{(k)}\}_{k=1}^{K} | \boldsymbol{\gamma}\right) = \prod_{k=1}^{K} \mathcal{MN}\left(U^{(k)}; 0, \Sigma_k, \Gamma^{-1}\right)$

where $\Sigma_k$ derives from $\varsigma_k$ . Componentwise, $u_r^{(k)} \sim \mathcal{N}(0, \gamma_r^{-1} \Sigma_k)$ , and the shrinkage parameter $\gamma_r$ has a Gamma prior:

$p(\gamma_r) = \text{Gam}(\gamma_r | a_r, b_r)$

As posterior inference proceeds, some $\gamma_r$ diverge, shrinking the corresponding rank-1 component towards zero; the effective rank is manifest as the number of “active” $r$ for which $\gamma_r$ remains finite. This induces a variational form of sparse Bayesian learning over a superposition of MOGP terms—automatic rank selection is thereby achieved (Li et al., 25 Dec 2025).

3. Variational Inference and Efficient Algorithmic Realizations

Inference is performed via mean-field variational Bayes, with the Evidence Lower Bound (ELBO):

$\mathcal{L}(q) = \mathbb{E}_{q}[\ln p(Y, \Theta)] - \mathbb{E}_q[\ln q(\Theta)]$

where $q(\Theta) = q(\tau) q(\boldsymbol{\gamma}) \prod_{k,r} q(u_r^{(k)})$ , and blockwise coordinate ascent updates are derived for each factor. Crucially:

$q(u_r^{(k)}) = \mathcal{N}(m_r^{(k)}, \Psi_r^{(k)})$ , with closed-form updates for $(m_r^{(k)}, \Psi_r^{(k)})$ that involve expectations over Khatri–Rao products and GP kernel inverses.
$q(\gamma_r)$ and $q(\tau)$ remain conjugate Gamma distributions.

The dominant computational cost is inverting $N_k \times N_k$ matrices per mode $k$ and factor $r$ , scaling as $\mathcal{O}(R \max_k N_k^3)$ . Unused CP terms are pruned as their $\gamma_r$ explode, yielding a practical reduction in complexity and improved convergence (Li et al., 25 Dec 2025).

4. Universality and Theoretical Properties

If each kernel $\varsigma_k$ defines a universal reproducing kernel Hilbert space (RKHS) on a compact domain, then the CP sum of products of MOGP mean functions can uniformly approximate any continuous function on that domain. Specifically, the RR-FBTC model has the universal approximation property: for any $g \in C(\mathcal{Z})$ and $\epsilon > 0$ , there exist parameters such that the posterior mean

$f(\boldsymbol{x}) = \sum_{r=1}^R \prod_{k=1}^K \bar u_r^{(k)}(x_k)$

satisfies $\| f - g \|_{\infty} < \epsilon$ , provided $R$ is sufficiently large. This extends the universality of kernel methods to the tensor-valued, multioutput setting, supporting arbitrarily expressive models for continuous tensor signals (Li et al., 25 Dec 2025).

5. Empirical Performance and Benchmarking

RR-FBTC and related MOGP-based tensor decomposition frameworks demonstrate state-of-the-art results across synthetic and real-world data. Empirical evaluations include:

Synthetic tensors (e.g., $30 \times 30 \times 30$ , various ranks), US-Temperature, 3D sound-speed, and image inpainting.
Key metrics: relative root square error (RRSE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM).
RR-FBTC consistently achieves lower RRSE and RMSE than Bayesian CP, neural network CP, and single-output GP baselines, with superior rank recovery even at high noise and low observation rates.
The learned basis functions from RR-FBTC reflect physically meaningful patterns (e.g., latitude/longitude temperature variation) and are robust to the initial rank overparameterization.
Computationally, RR-FBTC exhibits competitive or superior run-times relative to continuous-CP neural methods and earlier functional-GP approaches (Li et al., 25 Dec 2025).

Other Bayesian tensor models generalize or complement MOGP-based approaches:

Global–local priors such as the Horseshoe (“one-group” priors) induce rank-sparse decomposition via heavy-tailed shrinkage on CP or TT components, achieving “tuning-free” model adaptation and strong finite-sample performance (Gilbert et al., 2019).
Bayesian tensor train (TT) factorization with Gaussian-product-Gamma hyperpriors supports automatic slicing-rank determination and scalable variational inference (Xu et al., 2020).
Hierarchical sparsity-inducing priors on non-CP representations, e.g., re-weighted Laplace or mixture-of-Gaussians models, provide mechanisms for modeling non-low-rank residual structure and outlier effects (Zhang et al., 2017).

These developments integrate MOGP machinery with probabilistic low-rank tensor learning, rank regularization, and flexible prior architectures.

References:

"When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning" (Li et al., 25 Dec 2025)
"Tuning Free Rank-Sparse Bayesian Matrix and Tensor Completion with Global-Local Priors" (Gilbert et al., 2019)
"Beyond Low Rank: A Data-Adaptive Tensor Completion Method" (Zhang et al., 2017)
"Tensor Train Factorization and Completion under Noisy Data with Prior Analysis and Rank Estimation" (Xu et al., 2020)
"Rank regularization and Bayesian inference for tensor completion and extrapolation" (Bazerque et al., 2013)