Multioutput Gaussian Processes
- Multioutput Gaussian Processes are nonparametric frameworks that model vector-valued functions by jointly capturing correlations among outputs in tensor decomposition.
- They employ hierarchical shrinkage priors to enable automatic rank selection and prune inactive components for efficient modeling.
- Variational inference with closed-form updates ensures scalable learning and robust performance in tensor completion and factorization tasks.
A multioutput Gaussian process (MOGP) is a nonparametric probabilistic framework for modeling vector-valued functions, allowing flexible and joint treatment of correlated outputs or tensor-valued signals over continuous index sets. Within functional tensor decomposition and completion, MOGPs serve as priors over latent factor functions, encoding both intra-factor smoothness and inter-factor dependencies, supporting scalable inference, automatic rank selection, and functional universality.
1. Mathematical Foundations of Multioutput GPs in Tensor Decomposition
Consider a -mode functional tensor , defined over , each . The canonical polyadic (CP)-type decomposition for the noise-free field is
where each is a latent function. For parsimonious, flexible modeling of , an -output vector of functions, the multioutput Gaussian process prior is
with a positive definite kernel capturing smoothness over , and row covariance controlling power across CP components. The observed data are then
Each MOGP thus defines a prior over multivariate, mode-specific latent factors, enabling the application of the full machinery of Gaussian process-based function learning for both discrete and continuous-domain tensor settings (Li et al., 25 Dec 2025).
2. Hierarchical Shrinkage Priors and Automatic Rank Determination
Extending basic MOGPs, a hierarchical Bayesian scheme is employed for rank learning. Discretizing at grids , let be the sampled factor matrix. The prior is:
where derives from . Componentwise, , and the shrinkage parameter has a Gamma prior:
As posterior inference proceeds, some diverge, shrinking the corresponding rank-1 component towards zero; the effective rank is manifest as the number of “active” for which remains finite. This induces a variational form of sparse Bayesian learning over a superposition of MOGP terms—automatic rank selection is thereby achieved (Li et al., 25 Dec 2025).
3. Variational Inference and Efficient Algorithmic Realizations
Inference is performed via mean-field variational Bayes, with the Evidence Lower Bound (ELBO):
where , and blockwise coordinate ascent updates are derived for each factor. Crucially:
- , with closed-form updates for that involve expectations over Khatri–Rao products and GP kernel inverses.
- and remain conjugate Gamma distributions.
The dominant computational cost is inverting matrices per mode and factor , scaling as . Unused CP terms are pruned as their explode, yielding a practical reduction in complexity and improved convergence (Li et al., 25 Dec 2025).
4. Universality and Theoretical Properties
If each kernel defines a universal reproducing kernel Hilbert space (RKHS) on a compact domain, then the CP sum of products of MOGP mean functions can uniformly approximate any continuous function on that domain. Specifically, the RR-FBTC model has the universal approximation property: for any and , there exist parameters such that the posterior mean
satisfies , provided is sufficiently large. This extends the universality of kernel methods to the tensor-valued, multioutput setting, supporting arbitrarily expressive models for continuous tensor signals (Li et al., 25 Dec 2025).
5. Empirical Performance and Benchmarking
RR-FBTC and related MOGP-based tensor decomposition frameworks demonstrate state-of-the-art results across synthetic and real-world data. Empirical evaluations include:
- Synthetic tensors (e.g., , various ranks), US-Temperature, 3D sound-speed, and image inpainting.
- Key metrics: relative root square error (RRSE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM).
- RR-FBTC consistently achieves lower RRSE and RMSE than Bayesian CP, neural network CP, and single-output GP baselines, with superior rank recovery even at high noise and low observation rates.
- The learned basis functions from RR-FBTC reflect physically meaningful patterns (e.g., latitude/longitude temperature variation) and are robust to the initial rank overparameterization.
- Computationally, RR-FBTC exhibits competitive or superior run-times relative to continuous-CP neural methods and earlier functional-GP approaches (Li et al., 25 Dec 2025).
6. Model Class Extensions and Related Methodological Developments
Other Bayesian tensor models generalize or complement MOGP-based approaches:
- Global–local priors such as the Horseshoe (“one-group” priors) induce rank-sparse decomposition via heavy-tailed shrinkage on CP or TT components, achieving “tuning-free” model adaptation and strong finite-sample performance (Gilbert et al., 2019).
- Bayesian tensor train (TT) factorization with Gaussian-product-Gamma hyperpriors supports automatic slicing-rank determination and scalable variational inference (Xu et al., 2020).
- Hierarchical sparsity-inducing priors on non-CP representations, e.g., re-weighted Laplace or mixture-of-Gaussians models, provide mechanisms for modeling non-low-rank residual structure and outlier effects (Zhang et al., 2017).
These developments integrate MOGP machinery with probabilistic low-rank tensor learning, rank regularization, and flexible prior architectures.
References:
- "When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning" (Li et al., 25 Dec 2025)
- "Tuning Free Rank-Sparse Bayesian Matrix and Tensor Completion with Global-Local Priors" (Gilbert et al., 2019)
- "Beyond Low Rank: A Data-Adaptive Tensor Completion Method" (Zhang et al., 2017)
- "Tensor Train Factorization and Completion under Noisy Data with Prior Analysis and Rank Estimation" (Xu et al., 2020)
- "Rank regularization and Bayesian inference for tensor completion and extrapolation" (Bazerque et al., 2013)