Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the duality between contrastive and non-contrastive self-supervised learning (2206.02574v3)

Published 3 Jun 2022 in cs.LG, cs.AI, and cs.CV

Abstract: Recent approaches in self-supervised learning of image representations can be categorized into different families of methods and, in particular, can be divided into contrastive and non-contrastive approaches. While differences between the two families have been thoroughly discussed to motivate new approaches, we focus more on the theoretical similarities between them. By designing contrastive and covariance based non-contrastive criteria that can be related algebraically and shown to be equivalent under limited assumptions, we show how close those families can be. We further study popular methods and introduce variations of them, allowing us to relate this theoretical result to current practices and show the influence (or lack thereof) of design choices on downstream performance. Motivated by our equivalence result, we investigate the low performance of SimCLR and show how it can match VICReg's with careful hyperparameter tuning, improving significantly over known baselines. We also challenge the popular assumption that non-contrastive methods need large output dimensions. Our theoretical and quantitative results suggest that the numerical gaps between contrastive and non-contrastive methods in certain regimes can be closed given better network design choices and hyperparameter tuning. The evidence shows that unifying different SOTA methods is an important direction to build a better understanding of self-supervised learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. arXiv preprint arXiv:2205.11508, 2022.
  2. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
  3. Signature verification using a “siamese” time delay neural network. In NeurIPS, 1994.
  4. Deep clustering for unsupervised learning. In ECCV, 2018.
  5. Unsupervised learning of visual features by contrasting cluster assignments. In NeurIPS, 2020.
  6. Emerging properties in self-supervised vision transformers. In ICCV, 2021.
  7. A simple framework for contrastive learning of visual representations. In ICML, pp.  1597–1607. PMLR, 2020a.
  8. Intriguing properties of contrastive losses. Advances in Neural Information Processing Systems, 34:11834–11845, 2021a.
  9. Exploring simple siamese representation learning. In CVPR, 2020.
  10. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.
  11. An empirical study of training self-supervised vision transformers. In ICCV, 2021b.
  12. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  13. Whitening for self-supervised representation learning, 2021.
  14. Watermarking images in self-supervised latent spaces. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022.
  15. Measuring statistical dependence with hilbert-schmidt norms. In International conference on algorithmic learning theory, pp.  63–77. Springer, 2005.
  16. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020.
  17. Beyond supervised vs. unsupervised: Representative benchmarking and analysis of image representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9642–9652, 2022.
  18. Predictor networks and stop-grads provide implicit variance regularization in byol/simsiam. arXiv preprint arXiv:2212.04858, 2022.
  19. Provable guarantees for self-supervised deep learning with spectral contrastive loss. NeurIPS, 34, 2021.
  20. Beyond separability: Analyzing the linear transferability of contrastive representations to related subpopulations. arXiv preprint arXiv:2204.02683, 2022.
  21. Deep residual learning for image recognition. In CVPR, 2016.
  22. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  23. Towards the generalization of contrastive self-supervised learning. arXiv preprint arXiv:2111.00743, 2021.
  24. Understanding dimensional collapse in contrastive self-supervised learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=YevsQ05DEN7.
  25. Similarity of neural network representations revisited. In International Conference on Machine Learning, pp. 3519–3529. PMLR, 2019.
  26. Ica with reconstruction cost for efficient overcomplete feature learning. NeurIPS, 24, 2011.
  27. Predicting what you already know helps: Provable self-supervised learning. Advances in Neural Information Processing Systems, 34:309–323, 2021a.
  28. Compressive visual representations. In NeurIPS, 2021b.
  29. Efficient self-supervised vision transformers for representation learning. In ICLR, 2022a.
  30. Shengqiao Li. Concise formulas for the area and volume of a hyperspherical cap. Asian Journal of Mathematics and Statistics, 4(1):66–70, 2011.
  31. Neural manifold clustering and embedding. arXiv preprint arXiv:2201.10000, 2022b.
  32. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  33. Contrasting the landscape of contrastive and non-contrastive learning. arXiv preprint arXiv:2203.15702, 2022.
  34. Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation. arXiv preprint arXiv:2204.00570, 2022.
  35. Exploring the equivalence of siamese self-supervised learning via a unified gradient framework. arXiv preprint arXiv:2112.05141, 2021.
  36. Understanding self-supervised learning dynamics without contrastive pairs. arXiv preprint arXiv:2102.06810, 2021.
  37. Pushing the limits of self-supervised resnets: Can we outperform supervised learning without labels on imagenet? arXiv preprint arXiv:2201.05119, 2022.
  38. A note on connecting barlow twins with negative-sample-free contrastive learning. arXiv preprint arXiv:2104.13712, 2021.
  39. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML, pp.  9929–9939. PMLR, 2020.
  40. Toward understanding the feature learning process of self-supervised contrastive learning. In International Conference on Machine Learning, pp. 11112–11122. PMLR, 2021.
  41. Decoupled contrastive learning. arXiv preprint arXiv:2110.06848, 2021.
  42. Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888, 2017.
  43. Barlow twins: Self-supervised learning via redundancy reduction. In ICML, pp.  12310–12320. PMLR, 2021.
  44. Dual temperature helps contrastive learning without many negative samples: Towards understanding and simplifying moco. arXiv preprint arXiv:2203.17248, 2022.
  45. ibot: Image bert pre-training with online tokenizer. In ICLR, 2022a.
  46. Mugs: A multi-granular self-supervised learning framework. arXiv preprint arXiv:2203.14415, 2022b.
Citations (82)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com