Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Common Stability Mechanism behind most Self-Supervised Learning Approaches (2402.14957v1)

Published 22 Feb 2024 in cs.CV and cs.LG

Abstract: Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniques, e.g. by utilizing negative examples in a contrastive formulation, or exponential moving average and predictor in BYOL and SimSiam. In this paper, we provide a framework to explain the stability mechanism of these different SSL techniques: i) we discuss the working mechanism of contrastive techniques like SimCLR, non-contrastive techniques like BYOL, SWAV, SimSiam, Barlow Twins, and DINO; ii) we provide an argument that despite different formulations these methods implicitly optimize a similar objective function, i.e. minimizing the magnitude of the expected representation over all data samples, or the mean of the data distribution, while maximizing the magnitude of the expected representation of individual samples over different data augmentations; iii) we provide mathematical and empirical evidence to support our framework. We formulate different hypotheses and test them using the Imagenet100 dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Self-supervised classification network. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI, pp.  116–132. Springer, 2022.
  2. Self-labelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371, 2019.
  3. Learning representations by maximizing mutual information across views. Advances in neural information processing systems, 32, 2019.
  4. Multimae: Multi-modal multi-task masked autoencoders. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pp.  348–367. Springer, 2022.
  5. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  6. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
  7. Unsupervised learning by predicting noise. In International Conference on Machine Learning, pp.  517–526. PMLR, 2017.
  8. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), pp.  132–149, 2018.
  9. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
  10. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9650–9660, 2021.
  11. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020.
  12. Intriguing properties of contrastive losses. Advances in Neural Information Processing Systems, 34:11834–11845, 2021.
  13. Exploring simple siamese representation learning. corr abs/2011.10566 (2020). arXiv preprint arXiv:2011.10566, 2020.
  14. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. IEEE, 2009.
  15. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE international conference on computer vision, pp.  1422–1430, 2015.
  16. Discriminative unsupervised feature learning with convolutional neural networks. Advances in neural information processing systems, 27, 2014.
  17. Whitening for self-supervised representation learning. In International Conference on Machine Learning, pp.  3015–3024. PMLR, 2021.
  18. On the duality between contrastive and non-contrastive self-supervised learning. arXiv preprint arXiv:2206.02574, 2022.
  19. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
  20. Self-supervised pretraining of visual features in the wild. arXiv preprint arXiv:2103.01988, 2021.
  21. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  22. Momentum contrast for unsupervised visual representation learning. arxiv e-prints. arXiv preprint arXiv:1911.05722, 2019.
  23. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16000–16009, 2022.
  24. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  25. Deep metric learning using triplet network. In Similarity-Based Pattern Recognition: Third International Workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3, pp.  84–92. Springer, 2015.
  26. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  27. Ica with reconstruction cost for efficient overcomplete feature learning. Advances in neural information processing systems, 24, 2011.
  28. Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6707–6717, 2020.
  29. Unsupervised learning of visual representations by solving jigsaw puzzles. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pp.  69–84. Springer, 2016.
  30. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  31. Byol works even without batch statistics. arXiv preprint arXiv:2010.10241, 2020.
  32. Understanding self-supervised learning dynamics without contrastive pairs. In International Conference on Machine Learning, pp.  10268–10278. PMLR, 2021.
  33. Scan: Learning to classify images without labels. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X, pp.  268–285. Springer, 2020.
  34. Unsupervised semantic segmentation by contrasting object mask proposals. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10052–10062, 2021.
  35. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pp.  1096–1103, 2008.
  36. Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14668–14678, 2022.
  37. An investigation into whitening loss for self-supervised learning. Advances in Neural Information Processing Systems, 35:29748–29760, 2022.
  38. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9653–9663, 2022.
  39. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pp.  12310–12320. PMLR, 2021.
  40. How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning. arXiv preprint arXiv:2203.16262, 2022.
  41. Colorful image colorization. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pp.  649–666. Springer, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Abhishek Jha (23 papers)
  2. Matthew B. Blaschko (65 papers)
  3. Tinne Tuytelaars (150 papers)
  4. Yuki M. Asano (63 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com