Source of performance gains: isotropic prior choice versus Euclidean gradient dynamics

Determine whether, in backbone–projector self-supervised learning architectures that impose an isotropic prior distribution (such as a Gaussian or Laplace prior) on projector outputs while using only backbone features at inference, the observed downstream performance gains are attributable to the specific choice of isotropic prior versus to the general favorable learning dynamics of Euclidean gradients.

Background

The paper studies kernel-based regularization for Euclidean self-supervised learning with backbone–projector architectures, where training constrains the geometry of projector outputs but inference discards the projector and uses backbone features. The authors compare Gaussian and Laplace isotropic priors and observe negligible differences in downstream classification performance in non-sliced experiments.

This decoupling raises uncertainty about what primarily drives performance: the explicit choice of isotropic prior for projector outputs or the broader learning dynamics induced by Euclidean gradients in these architectures. Clarifying this would guide the design of regularizers and priors in Euclidean SSL.

References

Experimentally, we observe that the choice between Gaussian and Laplace priors on non-sliced experiments had negligible impact on downstream classification performance. Consequently, it remains unclear whether performance gains stem from the specific choice of isotropic prior or simply from the favorable learning dynamics of Euclidean gradients.

KerJEPA: Kernel Discrepancies for Euclidean Self-Supervised Learning (2512.19605 - Zimmermann et al., 22 Dec 2025) in Section 6 (Discussion), Representation geometry paragraph