Papers
Topics
Authors
Recent
2000 character limit reached

Multioutput Gaussian Processes

Updated 2 January 2026
  • Multioutput Gaussian Processes are nonparametric frameworks that model vector-valued functions by jointly capturing correlations among outputs in tensor decomposition.
  • They employ hierarchical shrinkage priors to enable automatic rank selection and prune inactive components for efficient modeling.
  • Variational inference with closed-form updates ensures scalable learning and robust performance in tensor completion and factorization tasks.

A multioutput Gaussian process (MOGP) is a nonparametric probabilistic framework for modeling vector-valued functions, allowing flexible and joint treatment of correlated outputs or tensor-valued signals over continuous index sets. Within functional tensor decomposition and completion, MOGPs serve as priors over latent factor functions, encoding both intra-factor smoothness and inter-factor dependencies, supporting scalable inference, automatic rank selection, and functional universality.

1. Mathematical Foundations of Multioutput GPs in Tensor Decomposition

Consider a KK-mode functional tensor Y(x(1),,x(K))\mathcal{Y}(x^{(1)}, \dots, x^{(K)}), defined over X(1)××X(K)\mathcal{X}^{(1)} \times \cdots \times \mathcal{X}^{(K)}, each X(k)R\mathcal{X}^{(k)} \subset \mathbb{R}. The canonical polyadic (CP)-type decomposition for the noise-free field is

xi=r=1Rk=1Kur(k)(ik)x_{\boldsymbol i} = \sum_{r=1}^{R} \prod_{k=1}^{K} u_r^{(k)}(i_k)

where each ur(k):X(k)Ru_r^{(k)} : \mathcal{X}^{(k)} \to \mathbb{R} is a latent function. For parsimonious, flexible modeling of U(k)()=[u1(k)(),,uR(k)()]\mathbf{U}^{(k)}(\cdot) = [u_1^{(k)}(\cdot), \ldots, u_R^{(k)}(\cdot)], an RR-output vector of functions, the multioutput Gaussian process prior is

U(k)()MGP(0,ςk(,),Γ1)\mathbf{U}^{(k)}(\cdot) \sim \mathcal{MGP}(\mathbf{0}, \varsigma_k(\cdot, \cdot), \Gamma^{-1})

with a positive definite kernel ςk\varsigma_k capturing smoothness over X(k)\mathcal{X}^{(k)}, and row covariance Γ1=diag(γ11,,γR1)\Gamma^{-1} = \mathrm{diag}(\gamma_1^{-1}, \ldots, \gamma_R^{-1}) controlling power across CP components. The observed data yny_n are then

yn{ur(k)},τN(r=1Rk=1Kur(k)(ikn),τ1)y_n \mid \{u_r^{(k)}\}, \tau \sim \mathcal{N}\left( \sum_{r=1}^R \prod_{k=1}^K u_r^{(k)}(i_k^{n}), \tau^{-1} \right)

Each MOGP thus defines a prior over multivariate, mode-specific latent factors, enabling the application of the full machinery of Gaussian process-based function learning for both discrete and continuous-domain tensor settings (Li et al., 25 Dec 2025).

2. Hierarchical Shrinkage Priors and Automatic Rank Determination

Extending basic MOGPs, a hierarchical Bayesian scheme is employed for rank learning. Discretizing U(k)()\mathbf{U}^{(k)}(\cdot) at grids Sk\mathcal{S}_k, let U(k)RNk×RU^{(k)} \in \mathbb{R}^{N_k \times R} be the sampled factor matrix. The prior is:

p({U(k)}k=1Kγ)=k=1KMN(U(k);0,Σk,Γ1)p\left(\{U^{(k)}\}_{k=1}^{K} | \boldsymbol{\gamma}\right) = \prod_{k=1}^{K} \mathcal{MN}\left(U^{(k)}; 0, \Sigma_k, \Gamma^{-1}\right)

where Σk\Sigma_k derives from ςk\varsigma_k. Componentwise, ur(k)N(0,γr1Σk)u_r^{(k)} \sim \mathcal{N}(0, \gamma_r^{-1} \Sigma_k), and the shrinkage parameter γr\gamma_r has a Gamma prior:

p(γr)=Gam(γrar,br)p(\gamma_r) = \text{Gam}(\gamma_r | a_r, b_r)

As posterior inference proceeds, some γr\gamma_r diverge, shrinking the corresponding rank-1 component towards zero; the effective rank is manifest as the number of “active” rr for which γr\gamma_r remains finite. This induces a variational form of sparse Bayesian learning over a superposition of MOGP terms—automatic rank selection is thereby achieved (Li et al., 25 Dec 2025).

3. Variational Inference and Efficient Algorithmic Realizations

Inference is performed via mean-field variational Bayes, with the Evidence Lower Bound (ELBO):

L(q)=Eq[lnp(Y,Θ)]Eq[lnq(Θ)]\mathcal{L}(q) = \mathbb{E}_{q}[\ln p(Y, \Theta)] - \mathbb{E}_q[\ln q(\Theta)]

where q(Θ)=q(τ)q(γ)k,rq(ur(k))q(\Theta) = q(\tau) q(\boldsymbol{\gamma}) \prod_{k,r} q(u_r^{(k)}), and blockwise coordinate ascent updates are derived for each factor. Crucially:

  • q(ur(k))=N(mr(k),Ψr(k))q(u_r^{(k)}) = \mathcal{N}(m_r^{(k)}, \Psi_r^{(k)}), with closed-form updates for (mr(k),Ψr(k))(m_r^{(k)}, \Psi_r^{(k)}) that involve expectations over Khatri–Rao products and GP kernel inverses.
  • q(γr)q(\gamma_r) and q(τ)q(\tau) remain conjugate Gamma distributions.

The dominant computational cost is inverting Nk×NkN_k \times N_k matrices per mode kk and factor rr, scaling as O(RmaxkNk3)\mathcal{O}(R \max_k N_k^3). Unused CP terms are pruned as their γr\gamma_r explode, yielding a practical reduction in complexity and improved convergence (Li et al., 25 Dec 2025).

4. Universality and Theoretical Properties

If each kernel ςk\varsigma_k defines a universal reproducing kernel Hilbert space (RKHS) on a compact domain, then the CP sum of products of MOGP mean functions can uniformly approximate any continuous function on that domain. Specifically, the RR-FBTC model has the universal approximation property: for any gC(Z)g \in C(\mathcal{Z}) and ϵ>0\epsilon > 0, there exist parameters such that the posterior mean

f(x)=r=1Rk=1Kuˉr(k)(xk)f(\boldsymbol{x}) = \sum_{r=1}^R \prod_{k=1}^K \bar u_r^{(k)}(x_k)

satisfies fg<ϵ\| f - g \|_{\infty} < \epsilon, provided RR is sufficiently large. This extends the universality of kernel methods to the tensor-valued, multioutput setting, supporting arbitrarily expressive models for continuous tensor signals (Li et al., 25 Dec 2025).

5. Empirical Performance and Benchmarking

RR-FBTC and related MOGP-based tensor decomposition frameworks demonstrate state-of-the-art results across synthetic and real-world data. Empirical evaluations include:

  • Synthetic tensors (e.g., 30×30×3030 \times 30 \times 30, various ranks), US-Temperature, 3D sound-speed, and image inpainting.
  • Key metrics: relative root square error (RRSE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM).
  • RR-FBTC consistently achieves lower RRSE and RMSE than Bayesian CP, neural network CP, and single-output GP baselines, with superior rank recovery even at high noise and low observation rates.
  • The learned basis functions from RR-FBTC reflect physically meaningful patterns (e.g., latitude/longitude temperature variation) and are robust to the initial rank overparameterization.
  • Computationally, RR-FBTC exhibits competitive or superior run-times relative to continuous-CP neural methods and earlier functional-GP approaches (Li et al., 25 Dec 2025).

Other Bayesian tensor models generalize or complement MOGP-based approaches:

  • Global–local priors such as the Horseshoe (“one-group” priors) induce rank-sparse decomposition via heavy-tailed shrinkage on CP or TT components, achieving “tuning-free” model adaptation and strong finite-sample performance (Gilbert et al., 2019).
  • Bayesian tensor train (TT) factorization with Gaussian-product-Gamma hyperpriors supports automatic slicing-rank determination and scalable variational inference (Xu et al., 2020).
  • Hierarchical sparsity-inducing priors on non-CP representations, e.g., re-weighted Laplace or mixture-of-Gaussians models, provide mechanisms for modeling non-low-rank residual structure and outlier effects (Zhang et al., 2017).

These developments integrate MOGP machinery with probabilistic low-rank tensor learning, rank regularization, and flexible prior architectures.


References:

  • "When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning" (Li et al., 25 Dec 2025)
  • "Tuning Free Rank-Sparse Bayesian Matrix and Tensor Completion with Global-Local Priors" (Gilbert et al., 2019)
  • "Beyond Low Rank: A Data-Adaptive Tensor Completion Method" (Zhang et al., 2017)
  • "Tensor Train Factorization and Completion under Noisy Data with Prior Analysis and Rank Estimation" (Xu et al., 2020)
  • "Rank regularization and Bayesian inference for tensor completion and extrapolation" (Bazerque et al., 2013)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Multioutput Gaussian Processes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube