RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank (2210.02885v3)
Abstract: Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method -- coined RankMe -- allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset's labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations' quality assessment.
- Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. arXiv preprint arXiv:2205.11508, 2022.
- Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
- High fidelity visualization of what your self-supervised representation knows about. arXiv preprint arXiv:2112.09164, 2021.
- Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014.
- Signature verification using a “siamese” time delay neural network. In NeurIPS, 1994.
- Deep clustering for unsupervised learning. In ECCV, 2018.
- Unsupervised learning of visual features by contrasting cluster assignments. In NeurIPS, 2020.
- Emerging properties in self-supervised vision transformers. In ICCV, 2021.
- A simple framework for contrastive learning of visual representations. In ICML, pp. 1597–1607. PMLR, 2020a.
- Exploring simple siamese representation learning. In CVPR, 2020.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.
- An empirical study of training self-supervised vision transformers. In ICCV, 2021.
- Cover, T. M. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE transactions on electronic computers, (3):326–334, 1965.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936.
- Whitening for self-supervised representation learning, 2021.
- The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
- Breaking the softmax bottleneck via learnable monotonic pointwise non-linearities. In International Conference on Machine Learning, pp. 2073–2082. PMLR, 2019.
- On the duality between contrastive and non-contrastive self-supervised learning. arXiv preprint arXiv:2206.02574, 2022.
- Investigating power laws in deep representation learning. arXiv preprint arXiv:2202.05808, 2022.
- One network doesn’t rule them all: Moving beyond handcrafted architectures in self-supervised learning. arXiv preprint arXiv:2203.08130, 2022.
- Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
- Vissl. https://github.com/facebookresearch/vissl, 2021.
- Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020.
- Provable guarantees for self-supervised deep learning with spectral contrastive loss. NeurIPS, 34, 2021.
- Exploring the gap between collapsed & whitened features in self-supervised learning. In International Conference on Machine Learning, pp. 8613–8634. PMLR, 2022.
- Deep residual learning for image recognition. In CVPR, 2016.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
- Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377, 2021.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009, 2022.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- The inaturalist species classification and detection dataset. In CVPR, 2018.
- On feature decorrelation in self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9598–9608, 2021.
- Understanding dimensional collapse in contrastive self-supervised learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=YevsQ05DEN7.
- Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2901–2910, 2017.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pp. 554–561, 2013.
- Learning multiple layers of features from tiny images. 2009.
- Compressive visual representations. In NeurIPS, 2021.
- Understanding collapse in non-contrastive siamese representation learning. In European Conference on Computer Vision, pp. 490–505. Springer, 2022a.
- Efficient self-supervised vision transformers for representation learning. In ICLR, 2022b.
- Neural manifold clustering and embedding. arXiv preprint arXiv:2201.10000, 2022c.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Self-supervised learning of pretext-invariant representations. In CVPR, 2020.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Numerical recipes 3rd edition: The art of scientific computing. Cambridge university press, 2007.
- Selfaugment: Automatic augmentation policies for self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2674–2683, 2021.
- The effective rank: A measure of effective dimensionality. In 2007 15th European signal processing conference, pp. 606–610. IEEE, 2007.
- How does batch normalization help optimization? Advances in neural information processing systems, 31, 2018.
- Shannon, C. E. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- Pushing the limits of self-supervised resnets: Can we outperform supervised learning without labels on imagenet? arXiv preprint arXiv:2201.05119, 2022.
- Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454, 2018.
- Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018.
- Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3485–3492. IEEE, 2010.
- Decoupled contrastive learning. arXiv preprint arXiv:2110.06848, 2021.
- Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888, 2017.
- Barlow twins: Self-supervised learning via redundancy reduction. In ICML, pp. 12310–12320. PMLR, 2021.
- Learning deep features for scene recognition using places database. In NeurIPS, 2014.
- ibot: Image bert pre-training with online tokenizer. In ICLR, 2022a.
- Mugs: A multi-granular self-supervised learning framework. 2022b.
- Local aggregation for unsupervised learning of visual embeddings. In ICCV, 2019.