Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Poly-View Contrastive Learning (2403.05490v1)

Published 8 Mar 2024 in cs.LG, cs.AI, cs.CV, cs.IT, math.IT, and stat.ML

Abstract: Contrastive learning typically matches pairs of related views among a number of unrelated negative views. Views can be generated (e.g. by augmentations) or be observed. We investigate matching when there are more than two related views which we call poly-view tasks, and derive new representation learning objectives using information maximization and sufficient statistics. We show that with unlimited computation, one should maximize the number of related views, and with a fixed compute budget, it is beneficial to decrease the number of unique samples whilst increasing the number of views of those samples. In particular, poly-view contrastive models trained for 128 epochs with batch size 256 outperform SimCLR trained for 1024 epochs at batch size 4096 on ImageNet1k, challenging the belief that contrastive models require large batch sizes and many training epochs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Learning representations by maximizing mutual information across views. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp.  15509–15519, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ddf354219aac374f1d40b7e760ee5bb7-Abstract.html.
  2. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/92d1e1eb1cd6f9fba3227870bb6d7f07-Abstract.html.
  3. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/aa56c74513a5e35768a11f4e82dd7ffb-Abstract-Conference.html.
  4. The effects of regularization and data augmentation are class dependent. In NeurIPS, 2022a. URL http://papers.nips.cc/paper_files/paper/2022/hash/f73c04538a5e1cad40ba5586b4b517d3-Abstract-Conference.html.
  5. A data-augmentation is worth A thousand samples: Exact quantification from analytical augmented sample moments. CoRR, abs/2202.08325, 2022b. URL https://arxiv.org/abs/2202.08325.
  6. A cookbook of self-supervised learning. CoRR, abs/2304.12210, 2023. doi: 10.48550/arXiv.2304.12210. URL https://doi.org/10.48550/arXiv.2304.12210.
  7. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. CoRR, abs/2105.04906, 2021. URL https://arxiv.org/abs/2105.04906.
  8. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828, 2013. doi: 10.1109/TPAMI.2013.50. URL https://doi.org/10.1109/TPAMI.2013.50.
  9. Big vision. https://github.com/google-research/big_vision, 2022.
  10. Food-101 - mining discriminative components with random forests. In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars (eds.), Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI, volume 8694 of Lecture Notes in Computer Science, pp. 446–461. Springer, 2014. doi: 10.1007/978-3-319-10599-4_29. URL https://doi.org/10.1007/978-3-319-10599-4_29.
  11. Deep clustering for unsupervised learning of visual features. CoRR, abs/1807.05520, 2018. URL http://arxiv.org/abs/1807.05520.
  12. Unsupervised learning of visual features by contrasting cluster assignments. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/70feb62b69f16e0238f741fab228fec2-Abstract.html.
  13. Emerging properties in self-supervised vision transformers. CoRR, abs/2104.14294, 2021. URL https://arxiv.org/abs/2104.14294.
  14. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  1597–1607. PMLR, 2020a. URL http://proceedings.mlr.press/v119/chen20j.html.
  15. Exploring simple siamese representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp.  15750–15758. Computer Vision Foundation / IEEE, 2021. doi: 10.1109/CVPR46437.2021.01549. URL https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Exploring_Simple_Siamese_Representation_Learning_CVPR_2021_paper.html.
  16. Improved baselines with momentum contrastive learning. CoRR, abs/2003.04297, 2020b. URL https://arxiv.org/abs/2003.04297.
  17. An empirical study of training self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp.  9620–9629. IEEE, 2021. doi: 10.1109/ICCV48922.2021.00950. URL https://doi.org/10.1109/ICCV48922.2021.00950.
  18. Neural approximate sufficient statistics for implicit models. arXiv preprint arXiv:2010.10079, 2020c.
  19. Describing textures in the wild. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pp. 3606–3613. IEEE Computer Society, 2014. doi: 10.1109/CVPR.2014.461. URL https://doi.org/10.1109/CVPR.2014.461.
  20. Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, July 2006. ISBN 0471241954.
  21. Randaugment: Practical automated data augmentation with a reduced search space. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/d85b63ef0ccb114d0a3bb7b7d808028f-Abstract.html.
  22. Frederick E. Daum. The fisher-darmois-koopman-pitman theorem for random processes. In 1986 25th IEEE Conference on Decision and Control, pp. 1043–1044, 1986. doi: 10.1109/CDC.1986.267536.
  23. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  24. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst., 106(1):59–70, 2007. doi: 10.1016/J.CVIU.2005.09.012. URL https://doi.org/10.1016/j.cviu.2005.09.012.
  25. Drawing multiple augmentation samples per image during training efficiently decreases test error. CoRR, abs/2105.13343, 2021. URL https://arxiv.org/abs/2105.13343.
  26. The role of entropy and reconstruction in multi-view self-supervised learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 29143–29160. PMLR, 2023. URL https://proceedings.mlr.press/v202/rodri-guez-galvez23a.html.
  27. Bootstrap your own latent - A new approach to self-supervised learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/f3ada80d5c4ee70142b17b8192b2958e-Abstract.html.
  28. Application of the radon-nikodym theorem to the theory of sufficient statistics. Annals of Mathematical Statistics, 20:225–241, 1949. URL https://api.semanticscholar.org/CorpusID:119959959.
  29. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp.  1026–1034. IEEE Computer Society, 2015. doi: 10.1109/ICCV.2015.123. URL https://doi.org/10.1109/ICCV.2015.123.
  30. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR.2016.90.
  31. Momentum contrast for unsupervised visual representation learning. CoRR, abs/1911.05722, 2019. URL http://arxiv.org/abs/1911.05722.
  32. Data-efficient image recognition with contrastive predictive coding. CoRR, abs/1905.09272, 2019. URL http://arxiv.org/abs/1905.09272.
  33. Learning deep representations by mutual information estimation and maximization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=Bklr3j0cKX.
  34. Augment your batch: better training with larger batches. CoRR, abs/1901.09335, 2019. URL http://arxiv.org/abs/1901.09335.
  35. Scorecl: Augmentation-adaptive contrastive learning via score-matching function. CoRR, abs/2306.04175, 2023. doi: 10.48550/arXiv.2306.04175. URL https://doi.org/10.48550/arXiv.2306.04175.
  36. Collecting a large-scale dataset of fine-grained cars. 2013. URL https://api.semanticscholar.org/CorpusID:16632981.
  37. Cifar-10 (canadian institute for advanced research). 2014. URL http://www.cs.toronto.edu/~kriz/cifar.html.
  38. Towards a rigorous analysis of mutual information in contrastive learning. CoRR, abs/2308.15704, 2023. doi: 10.48550/arXiv.2308.15704. URL https://doi.org/10.48550/arXiv.2308.15704.
  39. Ralph Linsker. An application of the principle of maximum information preservation to linear systems. In David S. Touretzky (ed.), Advances in Neural Information Processing Systems 1, [NIPS Conference, Denver, Colorado, USA, 1988], pp. 186–194. Morgan Kaufmann, 1988. URL https://papers.nips.cc/paper_files/paper/1988/hash/ec8956637a99787bd197eacd77acce5e-Abstract.html.
  40. An efficient framework for learning sentence representations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=rJvJXZb0W.
  41. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
  42. Fine-grained visual classification of aircraft. CoRR, abs/1306.5151, 2013. URL http://arxiv.org/abs/1306.5151.
  43. Estimating divergence functionals and the likelihood ratio by convex risk minimization. CoRR, abs/0809.0853, 2008. URL http://arxiv.org/abs/0809.0853.
  44. Cats and dogs. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, pp.  3498–3505. IEEE Computer Society, 2012. doi: 10.1109/CVPR.2012.6248092. URL https://doi.org/10.1109/CVPR.2012.6248092.
  45. On variational bounds of mutual information. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp.  5171–5180. PMLR, 2019. URL http://proceedings.mlr.press/v97/poole19a.html.
  46. Ce Qi and Fei Su. Contrastive-center loss for deep neural networks. In 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, September 17-20, 2017, pp.  2851–2855. IEEE, 2017. doi: 10.1109/ICIP.2017.8296803. URL https://doi.org/10.1109/ICIP.2017.8296803.
  47. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  8748–8763. PMLR, 2021. URL http://proceedings.mlr.press/v139/radford21a.html.
  48. Contrastive learning with hard negative samples. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=CR1XOQ0UTh-.
  49. Alex Rogozhnikov. Einops: Clear and reliable tensor manipulations with einstein-like notation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=oapKSVM2bcj.
  50. Imagenet large scale visual recognition challenge. CoRR, abs/1409.0575, 2014. URL http://arxiv.org/abs/1409.0575.
  51. An information-theoretic perspective on variance-invariance-covariance regularization. CoRR, abs/2303.00633, 2023. doi: 10.48550/arXiv.2303.00633. URL https://doi.org/10.48550/arXiv.2303.00633.
  52. Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pp.  1849–1857, 2016. URL https://proceedings.neurips.cc/paper/2016/hash/6b180037abbebea991d8b1232f8a8ca9-Abstract.html.
  53. Deep metric learning via lifted structured feature embedding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 4004–4012. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.434. URL https://doi.org/10.1109/CVPR.2016.434.
  54. Multi-label contrastive predictive coding. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/5cd5058bca53951ffa7801bcdf421651-Abstract.html.
  55. Contrastive multiview coding. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (eds.), Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XI, volume 12356 of Lecture Notes in Computer Science, pp.  776–794. Springer, 2020a. doi: 10.1007/978-3-030-58621-8_45. URL https://doi.org/10.1007/978-3-030-58621-8_45.
  56. What makes for good views for contrastive learning? In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020b. URL https://proceedings.neurips.cc/paper/2020/hash/4c2e5eaae9152079b9e95845750bb9ab-Abstract.html.
  57. On mutual information maximization for representation learning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=rkxoh24FPH.
  58. Representation learning with contrastive predictive coding. CoRR, abs/1807.03748, 2018. URL http://arxiv.org/abs/1807.03748.
  59. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp.  5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  60. Self-supervised learning with data augmentations provably isolates content from style. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 16451–16467, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/8929c70f8d710e412d38da624b21c3c8-Abstract.html.
  61. Rethinking minimal sufficient representation in contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16041–16050, 2022.
  62. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp.  9929–9939. PMLR, 2020. URL http://proceedings.mlr.press/v119/wang20k.html.
  63. Contrastive learning with stronger augmentations. IEEE transactions on pattern analysis and machine intelligence, 45(5):5549–5560, 2022.
  64. SUN database: Large-scale scene recognition from abbey to zoo. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pp.  3485–3492. IEEE Computer Society, 2010. doi: 10.1109/CVPR.2010.5539970. URL https://doi.org/10.1109/CVPR.2010.5539970.
  65. Scaling SGD batch size to 32k for imagenet training. CoRR, abs/1708.03888, 2017. URL http://arxiv.org/abs/1708.03888.
  66. Deep sets. CoRR, abs/1703.06114, 2017. URL http://arxiv.org/abs/1703.06114.
  67. Stabilizing transformer training by preventing attention entropy collapse. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 40770–40803. PMLR, 2023a. URL https://proceedings.mlr.press/v202/zhai23a.html.
  68. Sigmoid loss for language image pre-training. CoRR, abs/2303.15343, 2023b. doi: 10.48550/arXiv.2303.15343. URL https://doi.org/10.48550/arXiv.2303.15343.
Citations (4)

Summary

  • The paper presents a multi-view framework that enriches deep representations by integrating varied perspectives of the input data.
  • The paper proposes a novel contrastive loss function crafted to leverage multiple views, boosting the model's ability to distinguish between similar and dissimilar samples.
  • The paper demonstrates through extensive experiments that the multi-view approach significantly improves robustness and scalability in downstream tasks.

Poly-View Contrastive Learning: Enhancements in Representation Learning through Multiple Views

Introduction

The paper presents a novel framework named Poly-View Contrastive Learning (PVCL), aimed at enhancing the quality of learned representations in deep learning models. This approach introduces the concept of generating multiple views of the input data and employing contrastive learning mechanisms to leverage these varied perspectives, thereby enriching the learned representations. By systematically generating and contrasting multiple views, PVCL seeks to capture a more comprehensive and robust understanding of the data features, leading to improvements in various downstream tasks.

Core Contributions

The primary contributions of this research can be distilled into a few key points:

  • Framework Design: The architecture of PVCL is meticulously structured to support the generation and integration of multiple data views, going beyond the conventional dual-view approaches prevalent in existing contrastive learning paradigms.
  • Contrastive Loss Function: A novel contrastive loss function is introduced, tailored to accommodate and capitalize on the multi-view data representations, encouraging the model to differentiate between similar and dissimilar samples more effectively.
  • Empirical Validation: Through extensive experiments, the paper validates the effectiveness of PVCL across several benchmarks. Notably, it demonstrates significant improvements in representation quality and task performance, indicating the utility of multi-view learning strategies.

Theoretical Framework

PVCL is grounded on the principle that leveraging multiple views of the same data can provide a more holistic and nuanced representation, enabling models to learn more distinctive features. The proposed framework operates by:

  1. View Generation: Employing a set of predefined transformations to generate multiple views from the original input data.
  2. Representation Learning: Utilizing a deep neural network to encode these views into high-dimensional representations.
  3. Contrastive Loss Optimization: Applying the newly formulated contrastive loss function to minimize the distance between representations of the same instance across different views while maximizing the distance between representations of different instances.

Experimental Results

The empirical evaluation of PVCL showcases its superiority over traditional single or dual-view contrastive learning approaches. Key findings include:

  • Enhanced Representation Quality: PVCL consistently outperforms baseline models in terms of representation quality, as evidenced by higher performance in downstream classification tasks.
  • Robustness to Variations: The model demonstrates a noteworthy ability to maintain performance across a range of data perturbations, indicating its robustness and adaptability.
  • Scalability: The scalability of PVCL is confirmed through its application to large-scale datasets, where it achieves notable improvements over existing methods.

Implications and Future Work

The research introduces a promising direction for utilizing multi-view data in enhancing representation learning. The implications of this work extend to various domains where data can be naturally represented in multiple forms or captured from different perspectives. Potential future avenues for this line of research include:

  • Optimization of View Generation: Exploring adaptive mechanisms for generating views that are most conducive to learning effective representations.
  • Application to Unsupervised and Semi-supervised Learning: Investigating the applicability of PVCL in settings with limited or no labeled data.
  • Integration with Other Learning Paradigms: Combining PVCL with other machine learning frameworks, such as generative models, to further enrich the learned representations.

Conclusion

In summary, the Poly-View Contrastive Learning framework offers a novel and effective approach to enhancing the quality of learned representations through the integration of multiple data views. The research not only provides a solid theoretical foundation for multi-view contrastive learning but also delivers compelling empirical evidence supporting its advantages. As such, PVCL represents a significant step forward in the quest for more sophisticated and capable representation learning techniques.