Latent Similarity Gaussian Processes

Updated 29 November 2025

LSGPs are nonparametric Bayesian models that extend Gaussian Processes by incorporating low-dimensional latent variables to parameterize similarity among outputs.
They improve multi-task learning by modeling correlations in latent space, which enables effective information sharing and enhanced interpretability.
Scalable inference methods such as variational inference and expectation propagation allow LSGPs to handle heterogeneous and mixed-type data efficiently.

Latent Similarity Gaussian Processes (LSGPs) are a family of nonparametric Bayesian models in which similarity, clustering, or correlation between tasks, objects, categories, or individuals is parameterized through shared or learned latent representations. The central concept is to modify the Gaussian Process (GP) covariance structure such that the similarity between outputs is governed or modulated by continuous, low-dimensional latent variables—either learned or introduced for interpretability and transfer. This paradigm encompasses a broad class of models, including multi-task GPs with latent factor decompositions, mixed-variable metamodels, patient-specific time-series models, and supervised hashing frameworks. LSGPs enable data-driven discovery of meaningful latent relations and facilitate information sharing via learned similarity in latent space (Ruan et al., 2017, Oune et al., 2021, Ozdemir et al., 2016, Hang et al., 22 Nov 2025, Bodin et al., 2017).

1. Definition and Model Family

LSGPs generalize standard Gaussian Processes by introducing explicit latent variables that modulate output covariances. For a set of $N$ data points, each associated with input $x_n$ and response $y_n$ , standard GPs model $\{y_n\}$ as noisy draws from a function $f(x)$ with a kernel $k_\theta(x,x')$ . LSGPs augment this by:

Introducing a latent variable $z_n$ (interpretable as patient, task, code, or categorical embedding).
Forming composite inputs $\hat{x}_n := [\,x_n\,;\,z_n\,]$ .
Defining a covariance $k(\hat{x}_n,\hat{x}_{n'})$ with structure enforcing that correlation increases as $z_n$ and $z_{n'}$ approach each other in latent space.

Notable LSGP subtypes include:

Type	Latent Structure	Purpose
Multi-task GPs (EM-GPR)	Task GPs mixed by latent weights	Task similarity
Latent Map GPs (LMGP)	Category manifold	Mixed-type modeling
Patient-similarity GPs	Patient embeddings	Borrowing strength
Hashing (GPH)	Code bits from GP posteriors	Semantic similarity
Latent GP regression	Component-wise latent gating	Modality separation

LSGPs differ from ad-hoc correlation models by learning the latent geometry from data, either via maximum likelihood (Oune et al., 2021, Ruan et al., 2017), variational inference (Hang et al., 22 Nov 2025, Bodin et al., 2017), or expectation-propagation (Ozdemir et al., 2016).

2. Covariance Construction via Latent Similarity

The defining operation in an LSGP is specifying the covariance function in terms of latent similarity. Patterns include:

Task-latent decomposition (EM-GPR):

$K_y = \sum_{q=1}^Q (w_q w_q^{\top}) \otimes K_q + \Sigma_y$

where $w_{dq}$ mixes latent processes $f_q$ across tasks, and task similarity $\sim w_d^{\top} w_{d'}$ (Ruan et al., 2017).

Augmented input kernels (LMGP, Patient LSGP):

$k\bigl((x,t),(x',t')\bigr) = k^x(x,x') \times k^z(z(t),z(t'))$

with $z(t)$ learned via a map $A$ , and similarity $s(t,t') = \exp\bigl(-\frac12\|z(t)-z(t')\|^2\bigl)$ (Oune et al., 2021, Hang et al., 22 Nov 2025).

Latent gating (Latent GP Regression):

$k_{\mathrm{fact}}((x,z),(x',z')) = \sum_{\ell=0}^L \varphi(z^{(\ell)})\varphi(z'^{(\ell)}) k_\ell(x,x')$

where $\varphi$ is an annealed softmax over simplex coordinates, partitioning the data into components (Bodin et al., 2017).

Binary code similarity (GPH):

$p(s_{i\ell}|V_{i\ell}) = \Phi(\sigma_y s_{i\ell} V_{i\ell}), \quad V_{i\ell} = b_i^{\top} b_\ell$

where code similarity drives semantic hashing (Ozdemir et al., 2016).

These constructions allow LSGPs to capture and exploit similarity patterns not directly observed in the inputs, supporting complex, multi-modal, and mixed-type data.

3. Parameter Learning and Inference Algorithms

LSGP parameter estimation requires joint learning of kernel parameters and latent variables. Crucial algorithmic workflows include:

Multi-task EM-GPR: Two-step procedure: (1) independent kernel hyperparameter learning per task, (2) joint optimization of latent weights $w_q$ . Ensemble mini-batch learning further scales inference (Ruan et al., 2017).
LMGP: Maximum likelihood estimation over kernel hyperparameters and latent map $A$ through gradient optimization, with latent manifold facilitating covariance modeling for categorical mixing (Oune et al., 2021).
Latent GP regression: Sparse variational inference, including MC-approximation for intractable $\Psi$ -statistics, with latent variables constrained to simplex (Bodin et al., 2017).
Patient LSGP: Stochastic variational approach; treating latent patient embeddings as Gaussian posteriors, maximizing the ELBO over GP and latent parameters with minibatch optimization (Hang et al., 22 Nov 2025).
Supervised Hashing (GPH): Expectation-propagation for Gaussian site approximation and parallel Gibbs sampling for binary code inference; sparse pseudo-input GP reduces cost (Ozdemir et al., 2016).

All frameworks support parallelization either over tasks, bits, or mini-batches, and they leverage automatic differentiation for scalable optimization.

4. Interpretability and Latent Similarity Analysis

A hallmark of LSGPs is the interpretable latent embedding extracted post-training:

In EM-GPR, the learned weight vectors $w_d$ represent low-dimensional task embeddings. Their dot-products or cosine similarity quantify task–task statistical affinity, revealing which outputs can transfer (Ruan et al., 2017).
LMGP’s linear map $A$ produces geometric coordinates $z(t)$ for each categorical input, visualizable as clustering or separation on the latent manifold. Pairwise similarities $s(t,t')$ elucidate which categories share response structure (Oune et al., 2021).
Patient LSGP recovers covariance matrices $K^z$ over patient embeddings, enabling comparison to demographic groupings; modularity analyses reveal that discovered similarity often transcends standard covariate splits (Hang et al., 22 Nov 2025).
Latent GP regression reports posterior mixture weights for each component, allowing prediction attribution to shared or distinct underlying processes, enabling component-wise uncertainty evaluation (Bodin et al., 2017).
Hashing GPH links latent binary codes to semantic similarity, which is interpretable via retrieval precision and code structure (Ozdemir et al., 2016).

LSGPs thus serve not only as predictive models but also as exploratory tools for uncovering data-driven similarity relationships.

5. Empirical Performance and Scalability

Across diverse applications, LSGPs deliver improved predictive accuracy, variance quantification, and computational tractability:

EM-GPR reduces mean-absolute error vs. “no-transfer,” ICM, SLFM, and GPRN. Ensemble EM-GPR further improves regression accuracy and runtime by variance reduction (Ruan et al., 2017).
LMGP outperforms classical mixed-type GPs (LVGP, additive-GP, multiplicative-GP) in Bayesian optimization and real-world prediction, handling variable-length inputs without architecture change (Oune et al., 2021).
Patient LSGP achieves higher log-likelihood, ROC-AUC, PPV, and sensitivity than pooled or individualized models, especially for low-data regimes; similarity structure is clinically interpretable and robust to demographic confounding (Hang et al., 22 Nov 2025).
Latent GP regression more accurately models multi-modal data, non-stationarity, and component-wise variation, resulting in lower RMSE and credible interval coverage than conventional stationary GPs (Bodin et al., 2017).
GPH scales to tens of thousands of images and $m$ -bit codes, maintaining high mean average precision in short code regimes and outperforming state-of-the-art supervised hashing algorithms (Ozdemir et al., 2016).

Computational efficiency is achieved via parallelization, sparse inducing-point methods, and ensemble averaging.

6. Theoretical and Practical Significance

LSGPs provide a unified framework for modeling latent similarity in heterogeneous, structured, and mixed-type data. Their principal advantages include:

Principled, likelihood-based inference for latent similarity, supporting uncertainty quantification and probabilistic transfer.
Flexibility to encode prior knowledge (via informative priors or kernel choices) and discover latent geometry directly from response structure.
Capacity to handle missing data, variable-length categorical inputs, task sharing, and multi-modal outputs without need for manual featurization or clustering.
Scalability to large datasets through parallel and sparse inference techniques.

A plausible implication is that LSGPs are particularly well-suited for low-data and highly heterogeneous settings, where borrowing strength via learned similarity directly enhances generalization and interpretability.

LSGPs subsume and generalize several established methodologies:

Intrinsic Coregionalization Model (ICM), SLFM, and GPRN as special cases when latent similarity is fixed or parametric (Ruan et al., 2017).
Sparse GP-LVM and mixture-of-experts approaches under the latent gating kernel regime (Bodin et al., 2017).
Metric learning for categorical variables is achieved via LMGP’s manifold projection, avoiding manual encoding (Oune et al., 2021).
Collaborative filtering and multitask regression by interpreting individuals/items as tasks and leveraging learned similarity.

Further avenues include deeper nonlinear mappings for latent manifolds, hierarchical latent similarity structures, and integration with downstream tasks such as Bayesian optimization, dynamic modeling, and sequential decision problems. Controversies may center on latent dimension selection, interpretability in high dimensions, and identifiability under conditional independence.

LSGPs thus represent a robust and extensible class of models for discovering and exploiting latent similarity in modern statistical learning.