Rethinking The Uniformity Metric in Self-Supervised Learning (2403.00642v2)
Abstract: Uniformity plays an important role in evaluating learned representations, providing insights into self-supervised learning. In our quest for effective uniformity metrics, we pinpoint four principled properties that such metrics should possess. Namely, an effective uniformity metric should remain invariant to instance permutations and sample replications while accurately capturing feature redundancy and dimensional collapse. Surprisingly, we find that the uniformity metric proposed by \citet{Wang2020UnderstandingCR} fails to satisfy the majority of these properties. Specifically, their metric is sensitive to sample replications, and can not account for feature redundancy and dimensional collapse correctly. To overcome these limitations, we introduce a new uniformity metric based on the Wasserstein distance, which satisfies all the aforementioned properties. Integrating this new metric in existing self-supervised learning methods effectively mitigates dimensional collapse and consistently improves their performance on downstream tasks involving CIFAR-10 and CIFAR-100 datasets. Code is available at \url{https://github.com/statsle/WassersteinSSL}.
- A theoretical analysis of contrastive unsupervised representation learning. In ICML, 2019.
- Vicreg: Variance-invariance-covariance regularization for self-supervised learning. In ICLR, 2022.
- A. Bhattacharyya. On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 1943.
- Unsupervised learning of visual features by contrasting cluster assignments. In NeurIPS, 2020.
- Emerging properties in self-supervised vision transformers. In ICCV, 2021.
- The convex geometry of linear inverse problems. Foundations of Computational mathematics, 12:805–849, 2012.
- A simple framework for contrastive learning of visual representations. In ICML, 2020.
- Exploring simple siamese representation learning. In CVPR, 2021.
- Universally optimal distribution of points on spheres. Journal of the American Mathematical Society, 2007.
- Solo-learn: A library of self-supervised methods for visual representation learning. JMLR, 2022.
- With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In ICCV, 2021.
- Simcse: Simple contrastive learning of sentence embeddings. In ArXiv, 2021.
- Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020.
- Dimensionality reduction by learning an invariant mapping. In CVPR, 2006.
- Deep residual learning for image recognition. In CVPR, 2016.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
- Distilling the knowledge in a neural network. ArXiv, abs/1503.02531, 2015.
- On feature decorrelation in self-supervised learning. In ICCV, 2021.
- Understanding dimensional collapse in contrastive self-supervised learning. In ICLR, 2022.
- Prototypical contrastive learning of unsupervised representations. In ICLR, 2021.
- Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In ICLR, 2022.
- Information theory and statistics. Journal of the American Statistical Association, 54:825, 1959.
- Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017.
- The distance between two random vectors with given dispersion matrices. Linear Algebra and its Applications, 48:257–263, 1982.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Understanding self-supervised learning dynamics without contrastive pairs. In ICML, 2021.
- Ramon Van Handel. Probability in high dimension. Lecture Notes (Princeton University), 2016.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML, 2020.
- Dense contrastive learning for self-supervised visual pre-training. In CVPR, 2021.
- Detco: Unsupervised contrastive learning for object detection. In ICCV, 2021.
- Instance localization for self-supervised detection pretraining. In CVPR, 2021.
- Scaling sgd batch size to 32k for imagenet training. ArXiv, 2017.
- Barlow twins: Self-supervised learning via redundancy reduction. In ICML, 2021.
- How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning. In ICLR, 2022a.
- Zero-CL: Instance and feature decorrelation for negative-free symmetric contrastive learning. In ICLR, 2022b.
- Contrastive learning for label efficient semantic segmentation. In ICCV, 2021.