Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning (2312.01118v1)
Abstract: Recently, self-supervised metric learning has raised attention for the potential to learn a generic distance function. It overcomes the limitations of conventional supervised one, e.g., scalability and label biases. Despite progress in this domain, current benchmarks, incorporating a narrow scope of classes, stop the nuanced evaluation of semantic representations. To bridge this gap, we introduce a large-scale benchmark with diversity and granularity of classes, Statistical Metric Learning Benchmark (SMLB) built upon ImageNet-21K and WordNet. SMLB is designed to rigorously evaluate the discriminative discernment and generalizability across more than 14M images, 20K classes, and 16K taxonomic nodes. Alongside, we propose novel evaluation metrics -- overlap' for separability andaSTD' for consistency -- to measure distance statistical information, which are efficient and robust to the change of class number. Our benchmark offers a novel perspective of evaluating the quality of representations beyond accuracy. Our findings reveal the limitations of supervised learning and the class bias inherent in SSL models, offering insights into potential areas for future model enhancement.
- Masked siamese networks for label-efficient learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI, pages 456–473. Springer, 2022.
- SiT: Self-supervised vIsion transformer. ArXiv preprint, abs/2104.03602, 2021.
- GMML is all you need. ArXiv preprint, abs/2205.14986, 2022.
- Unsupervised learning of visual features by contrasting cluster assignments. In Advances in Neural Information Processing Systems, pages 9912–9924. Curran Associates, Inc., 2020.
- Emerging properties in self-supervised vision transformers. ArXiv preprint, abs/2104.14294, 2021.
- Confusion-based metric learning for regularizing zero-shot image retrieval and clustering. IEEE Transactions on Neural Networks and Learning Systems, pages 1–14, 2022.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, pages 1597–1607. PMLR, 2020.
- An empirical study of training self-supervised vision transformers. ArXiv preprint, abs/2104.02057, 2021.
- Good proctor or “big brother”? ethics of online exam supervision technologies. 34(4):1581–1606, 2021.
- A large-scale study of spatiotemporal representation learning with a new benchmark on action recognition. CoRR, abs/2303.13505, 2023.
- Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2019.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, 2019. Association for Computational Linguistics.
- Unsupervised visual representation learning by context prediction. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1422–1430. IEEE, 2015.
- Training vision transformers for image retrieval. CoRR, abs/2102.05644, 2021.
- Christiane Fellbaum. WordNet: An electronic lexical database. MIT press, 1998.
- Deep metric learning with self-supervised ranking. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1370–1378, 2021.
- Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pages 1735–1742. IEEE.
- Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 9726–9735. IEEE, 2020.
- Masked autoencoders are scalable vision learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15979–15988. IEEE, 2022.
- Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 6(3):1–26, 2010.
- Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
- Deep metric learning: A survey. 11(9):1066.
- Npt-loss: Demystifying face recognition losses with nearest proxies triplet. IEEE transactions on pattern analysis and machine intelligence, 2022.
- Revisiting self-supervised visual representation learning. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1920–1929. IEEE.
- Large-margin softmax loss for convolutional neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 507–516. JMLR.org, 2016.
- Siamese prototypical contrastive learning. In Proceedings of British Machine Vision Conference, 2021.
- Rethinking prototypical contrastive learning through alignment, uniformity and correlation. In Proceedings of British Machine Vision Conference, 2022.
- Representation disentanglement in generative models with contrastive learning. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1531–1540, 2023.
- A metric learning reality check. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV, pages 681–699. Springer, 2020.
- Dinov2: Learning robust visual features without supervision, 2023.
- Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
- Object retrieval with large vocabularies and fast spatial matching. In 2007 IEEE conference on computer vision and pattern recognition, pages 1–8. IEEE, 2007.
- Revisiting oxford and paris: Large-scale image retrieval benchmarking. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 5706–5715. Computer Vision Foundation / IEEE Computer Society, 2018.
- Disentangling sampling and labeling bias for learning in large-output spaces. In International Conference on Machine Learning, pages 8890–8901. PMLR, 2021.
- Imagenet-21k pretraining for the masses. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021.
- Metric learning with adaptive density discrimination. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823.
- Principle-driven self-alignment of language models from scratch with minimal human supervision. arXiv preprint arXiv:2305.03047, 2023.
- K Tung. Flowers Dataset, 2020.
- Tackling inter-class similarity and intra-class variance for microscopic image-based classification. In Computer Vision Systems, pages 93–103. Springer International Publishing.
- Survey on distance metric learning and dimensionality reduction in data mining. Data mining and knowledge discovery, 29(2):534–564, 2015.
- Contrastive quantization with code memory for unsupervised image retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3):2468–2476, 2022.
- Shulei Wang. Self-supervised metric learning in multi-view data: A downstream task perspective. Journal of the American Statistical Association, pages 1–14, 2022.
- Object-wise masked autoencoders for fast pre-training. arXiv preprint arXiv:2205.14338, 2022.
- Masked momentum contrastive learning for zero-shot semantic understanding. arXiv preprint arXiv:2308.11448, 2023.
- Distance metric learning: A comprehensive survey. Michigan State Universiy, 2(2):4, 2006.
- Learning face representation from scratch. CoRR, abs/1411.7923, 2014.
- A large-scale study of representation learning with the visual task adaptation benchmark.
- iBOT: Image BERT pre-training with online tokenizer. ArXiv preprint, abs/2111.07832, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.