Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least (2302.09195v5)

Published 18 Feb 2023 in cs.LG and cs.AI

Abstract: Self-supervised learning (SSL) learns high-quality representations from large pools of unlabeled training data. As datasets grow larger, it becomes crucial to identify the examples that contribute the most to learning such representations. This enables efficient SSL by reducing the volume of data required. Nevertheless, quantifying the value of examples for SSL has remained an open question. In this work, we address this problem for the first time, by proving that examples that contribute the most to contrastive SSL are those that have the most similar augmentations to other examples, in expectation. We provide rigorous guarantees for the generalization performance of contrastive learning on such subsets. Through extensive experiments, we show that we can safely exclude 20% of examples from CIFAR100 and 40% from STL10 and TinyImageNet, without affecting downstream task performance. In general, subsets selected by our method outperform random subsets by over 3% across these datasets. Interestingly, we also discover the subsets that contribute the most to contrastive learning are those that contribute the least to supervised learning. Code available at https://github.com/bigml-cs-ucla/sas-data-efficient-contrastive-learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
  2. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM Journal on Computing, 44(5):1384–1402, 2015.
  3. A Simple Framework for Contrastive Learning of Visual Representations. February 2020. doi: 10.48550/arXiv.2002.05709. URL https://arxiv.org/abs/2002.05709v3.
  4. Exploring simple siamese representation learning, 2020.
  5. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15750–15758, 2021.
  6. Debiased Contrastive Learning. In Advances in Neural Information Processing Systems, volume 33, pp.  8765–8775. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/63c3ddcc7b23daa1e42dc41f9a44a873-Abstract.html.
  7. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp.  215–223. JMLR Workshop and Conference Proceedings, 2011a.
  8. An analysis of single-layer networks in unsupervised feature learning. In Gordon, G., Dunson, D., and Dudík, M. (eds.), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp.  215–223, Fort Lauderdale, FL, USA, 11–13 Apr 2011b. PMLR. URL https://proceedings.mlr.press/v15/coates11a.html.
  9. Selection via proxy: Efficient data selection for deep learning. In International Conference on Learning Representations (ICLR), 2020.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  11. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020a.
  12. Bootstrap your own latent: A new approach to self-supervised Learning, September 2020b. URL http://arxiv.org/abs/2006.07733. arXiv:2006.07733 [cs, stat].
  13. Provable guarantees for self-supervised deep learning with spectral contrastive loss, 2021. URL https://arxiv.org/abs/2106.04156.
  14. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016. doi: 10.1109/CVPR.2016.90.
  15. Momentum Contrast for Unsupervised Visual Representation Learning, March 2020. URL http://arxiv.org/abs/1911.05722. arXiv:1911.05722 [cs].
  16. Towards the generalization of contrastive self-supervised learning. arXiv preprint arXiv:2111.00743, 2021.
  17. Not all samples are created equal: Deep learning with importance sampling. In International conference on machine learning, pp. 2525–2534. PMLR, 2018.
  18. Grad-match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning, pp. 5464–5474. PMLR, 2021.
  19. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.
  20. Learning multiple layers of features from tiny images. 2009.
  21. Tiny imagenet visual recognition challenge. 2015.
  22. Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt. In Proceedings of the 39th International Conference on Machine Learning, pp.  15630–15649. PMLR, June 2022. URL https://proceedings.mlr.press/v162/mindermann22a.html. ISSN: 2640-3498.
  23. Minoux, M. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques: Proceedings of the 8th IFIP Conference on Optimization Techniques Würzburg, September 5–9, 1977, pp.  234–243. Springer, 2005.
  24. Fast constrained submodular maximization: Personalized data summarization. In International Conference on Machine Learning, pp. 1358–1367. PMLR, 2016.
  25. Coresets for data-efficient training of machine learning models. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  6950–6960. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/mirzasoleiman20a.html.
  26. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  27. Deep Learning on a Data Diet: Finding Important Examples Early in Training. In Advances in Neural Information Processing Systems, volume 34, pp.  20596–20607. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/ac56f8fe9eea3e4a365f29f0f1957c55-Abstract.html.
  28. Adaptive second order coresets for data-efficient machine learning. In International Conference on Machine Learning, pp. 17848–17869. PMLR, 2022.
  29. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763. PMLR, 2021.
  30. Contrastive Learning with Hard Negative Samples. October 2020. doi: 10.48550/arXiv.2010.04592. URL https://arxiv.org/abs/2010.04592v2.
  31. A theoretical analysis of contrastive unsupervised representation learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  5628–5637. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/saunshi19a.html.
  32. Beyond neural scaling laws: beating power law scaling via data pruning, August 2022. URL http://arxiv.org/abs/2206.14486. arXiv:2206.14486 [cs, stat].
  33. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  9275–9293, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.746. URL https://aclanthology.org/2020.emnlp-main.746.
  34. An Empirical Study of Example Forgetting During Deep Neural Network Learning, November 2019. URL http://arxiv.org/abs/1812.05159. arXiv:1812.05159 [cs, stat].
  35. Contrastive estimation reveals topic posterior information to linear models. J. Mach. Learn. Res., 22:281–1, 2021.
  36. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pp. 9929–9939. PMLR, 2020.
  37. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pp. 12310–12320. PMLR, 2021.
Citations (14)

Summary

We haven't generated a summary for this paper yet.