Papers
Topics
Authors
Recent
Search
2000 character limit reached

Gaussian Process Self-Supervised Learning

Updated 11 December 2025
  • GPSSL is a framework that fuses Gaussian Process priors with self-supervised learning to generate smooth, non-collapsing representations.
  • It leverages kernel-driven invariance to replace explicit data augmentations, resulting in robust and calibrated uncertainty estimates.
  • Empirical evaluations show that GPSSL improves accuracy, ROC-AUC, and risk–coverage metrics across synthetic, tabular, and biomedical tasks.

Gaussian Process Self-Supervised Learning (GPSSL) is a machine learning framework that integrates Gaussian Process (GP) priors with self-supervised representation learning objectives. GPSSL addresses challenges in traditional self-supervised learning (SSL), including the difficulty of generating positive sample pairs and the lack of rigorous uncertainty quantification, by leveraging the probabilistic structure inherent in GPs. This approach yields representations with smoothness, non-collapse, and robust uncertainty properties, making it suitable for a wide variety of downstream tasks, including those demanding calibrated confidence measures and consistent out-of-distribution behavior (Duan et al., 10 Dec 2025).

1. Formal Problem Framework

GPSSL operates on an unlabeled dataset X={x1,,xN}XX = \{x_1, \ldots, x_N\} \subset \mathcal{X}, targeting the construction of a representation mapping fz:XRJf_z: \mathcal{X} \to \mathbb{R}^J where fz(x)f_z(x) returns a JJ-dimensional embedding for any xXx\in\mathcal{X}. Unlike deterministic SSL, the goal is to learn a posterior distribution over fzf_z that enforces smoothness in the embedding space, prevents the degenerate collapse of representations, and provides explicit posterior uncertainty for each representation.

The framework combines:

  • A GP prior on fzf_z,
  • A generalized likelihood (a self-supervised loss (Z)\ell(Z) with no labels), forming a generalized Bayesian posterior over fzf_z.

2. Gaussian Process Prior on the Representation Map

A zero-mean vector-valued GP prior is imposed on fzf_z:

p(fz)=GP(0,K(,))p(f_z) = \mathrm{GP}(0, K(\cdot, \cdot))

For any finite set XX, the stacked representations Z:=fz(X)Z := f_z(X) are distributed as a multivariate normal N(0,K(X,X))N(0, K(X, X)) for each representation dimension. The kernel K(x,x)K(x, x') can be any positive-definite function, notably the RBF (squared-exponential) kernel:

K(x,x)=σ2exp(12(xx)TL2(xx))K(x, x') = \sigma^2 \exp\left(-\frac{1}{2}(x-x')^T L^{-2}(x-x')\right)

where LL is a lengthscale matrix, and σ2\sigma^2 is a variance parameter. Structured kernels (e.g., string kernels, graph kernels) can be incorporated for non-vectorial data types (Duan et al., 10 Dec 2025).

3. Generalized Bayesian Posterior and SSL Loss Construction

In the absence of labels, the traditional likelihood is replaced by a loss function inspired by VICReg:

(Z)=cV(Z)+cC(Z)\ell(Z) = c_V(Z) + c_C(Z)

where

cV(Z)=1Jj=1Jmax(0,γVar(zj)+ϵ)c_V(Z) = \frac{1}{J}\sum_{j=1}^J \max\big(0, \gamma - \sqrt{\mathrm{Var}(z^j)} + \epsilon\big)

cC(Z)=1N1i=1N(zizˉ)(zizˉ)T(off-diagonal norm)c_C(Z) = \frac{1}{N-1}\sum_{i=1}^N (z_i - \bar z)(z_i - \bar z)^T \quad\text{(off-diagonal norm)}

The generalized posterior is:

p~(fzX)p(fz)exp{(Z=fz(X))}\tilde{p}(f_z|X) \propto p(f_z) \exp\{-\ell(Z=f_z(X))\}

The negative log-posterior (up to a constant) becomes:

logp~(fzX)=12ZTK1Z+(Z)-\log \tilde{p}(f_z|X) = \frac{1}{2} Z^T K^{-1} Z + \ell(Z)

Hence, the empirical objective optimizes for both the VICReg-style loss and the GP prior regularization:

minfz(fz(X))+12fz(X)TK1fz(X)\min_{f_z} \ell(f_z(X)) + \frac{1}{2} f_z(X)^T K^{-1} f_z(X)

This objective integrates the variance and covariance penalties of VICReg with the smoothness and structure imposed by the GP prior (Duan et al., 10 Dec 2025).

4. Invariance via Kernel Structure and Relations to Existing Methods

GPSSL subsumes the need for hand-crafted positive pair augmentations central to contrastive and non-contrastive SSL. In contrastive SSL, invariance between pairs is enforced by terms like I(Z,Z)=(1/N)izizi2\ell_I(Z,Z') = (1/N)\sum_i \|z_i - z'_i\|^2; in GPSSL, such invariance is imposed implicitly:

  • The GP prior term 12ZTK1Z\frac{1}{2} Z^T K^{-1} Z encourages fz(x)fz(x)f_z(x) \approx f_z(x') when K(x,x)K(x,x') is large, and thus the kernel's affinity structure enforces similarity.
  • If the kernel explicitly couples only designated pairs, the prior imposes the equivalent of a pairwise invariance penalty.

This design allows GPSSL to function without explicit data augmentation or negative samples, generalizing invariance beyond user-specified pairs (Duan et al., 10 Dec 2025).

5. Connections to Kernel PCA and VICReg

GPSSL bridges neural SSL objectives (such as VICReg) and spectral unsupervised methods (such as kernel PCA).

  • VICReg: VICReg’s objective is typically VICReg(Z,Z)=cI(Z,Z)+cV(Z)+cC(Z)\ell_\mathrm{VICReg}(Z, Z') = c_I(Z, Z') + c_V(Z) + c_C(Z). GPSSL retains cVc_V and cCc_C, replacing the invariance term cIc_I with a GP prior.
  • Kernel PCA (kPCA): For J=1J=1, replacing cV(Z)c_V(Z) by Var(Z)-\mathrm{Var}(Z) and omitting cC(Z)c_C(Z), the MAP solution of GPSSL coincides with the leading kernel PCA component. Thus, GPSSL smoothly interpolates between modern non-contrastive SSL and classical kernel methods, subsuming both as limiting cases (Duan et al., 10 Dec 2025).

6. Uncertainty Quantification and Downstream Propagation

GPSSL enables a fully Bayesian treatment of representations:

  • Posterior at test points: Given a new xx^*, the predictive mean and covariance are as in standard GP regression:

μz(x)=K(x,X)K(X,X)1Z,σz2(x)=K(x,x)K(x,X)K(X,X)1K(X,x)\mu_z(x^*) = K(x^*, X) K(X, X)^{-1} Z, \quad \sigma^2_z(x^*) = K(x^*, x^*) - K(x^*, X) K(X, X)^{-1} K(X, x^*)

  • Variational inference: Inducing points and a variational distribution q(Uz)q(U_z) yield an approximate posterior via an ELBO:

ELBO=Eq(Z)KL[q(Uz)p(Uz)]\mathrm{ELBO} = -\mathbb{E}_q \ell(Z) - \mathrm{KL}[q(U_z) \| p(U_z)]

  • Uncertainty propagation: For downstream supervised tasks, uncertainty in ZZ is propagated via Monte Carlo integration:

p(YX)=p(YZ)p~(ZX)dZ1Mm=1Mp(YZ(m))p(Y|X) = \int p(Y|Z)\, \tilde{p}(Z|X) \, dZ \approx \frac{1}{M} \sum_{m=1}^M p(Y|Z^{(m)})

where Z(m)p~(ZX)Z^{(m)} \sim \tilde{p}(Z|X). This yields both “GPSSL-mean” (using the mean only) and “GPSSL-full” (sampling from the posterior for full Bayesian averaging) approaches.

A plausible implication is that uncertainty quantification inherent in GPSSL provides calibrated selective-classification and risk-control, surpassing kernel PCA and non-Bayesian SSL in this regard (Duan et al., 10 Dec 2025).

7. Empirical Evaluation and Observed Performance

Experimental results validate GPSSL across synthetic, tabular, and biomedical domains:

Task Type Evaluation Metrics GPSSL Empirical Outcome
Synthetic (circles) AURC, accuracy, risk–coverage GPSSL-full yields lowest AURC, best accuracy at fixed coverage compared to kPCA, VICReg, and GPSSL-mean
UCI tabular (4 datasets) Accuracy, ROC-AUC, AURC GPSSL-full attains best/near-best accuracy and ROC-AUC, and consistently lower AURC than competitors
Spatial transcriptomics (semi-synthetic, real) pMSE, accuracy, risk–coverage GPSSL embeddings (plus Bayesian Neural Net) recover correct spatial maps with calibrated uncertainties; best quantitative risk–coverage and pMSE

Overall, GPSSL eliminates hand-crafted data augmentation and negative pairs by encoding similarity through KK, provides Bayesian-calibrated uncertainties, and delivers improvements in both generalization accuracy and selective risk-control compared to established benchmarks such as VICReg and kernel PCA (Duan et al., 10 Dec 2025).

Parallel research demonstrates the flexibility of self-supervised GP frameworks for automatic pseudo-label generation outside of representation learning. In the domain of energy-aware wireless camera control, pseudo-labels derived from low-power detectors facilitate self-supervised GP regression to model probability of detection (POD) as a function of radio signal state. This prediction is incorporated into Bayesian filtering and control schemes that optimize detection probability against energy cost, with significant efficiency and accuracy gains observed in both simulations and real-world deployments (Varotto et al., 2021). This suggests broader potential for GP-based self-supervised paradigms in sensor systems, model-based control, and beyond.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Gaussian Process Self Supervised Learning (GPSSL).