Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning (2401.01990v2)

Published 3 Jan 2024 in cs.CV, cs.AI, and cs.LG

Abstract: We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised Learning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS-SSL proposes instead to design a metric space where Euclidean distances become a meaningful proxy for semantic relationship. In that space, it is now possible to generate positive samples from nearest neighbor sampling. Any prior knowledge can now be embedded into that metric space independently from the employed DA. From its simplicity, GPS-SSL is applicable to any SSL method, e.g. SimCLR or BYOL. A key benefit of GPS-SSL is in reducing the pressure in tailoring strong DAs. For example GPS-SSL reaches 85.58% on Cifar10 with weak DA while the baseline only reaches 37.51%. We therefore move a step forward towards the goal of making SSL less reliant on DA. We also show that even when using strong DAs, GPS-SSL outperforms the baselines on under-studied domains. We evaluate GPS-SSL along with multiple baseline SSL methods on numerous downstream datasets from different domains when the models use strong or minimal data augmentations. We hope that GPS-SSL will open new avenues in studying how to inject a priori knowledge into SSL in a principled manner.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Self-supervised learning from images with a joint-embedding predictive architecture. arXiv preprint arXiv:2301.08243, 2023.
  2. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. Advances in Neural Information Processing Systems, 35:26671–26685, 2022.
  3. The effects of regularization and data augmentation are class dependent. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  37878–37891. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/f73c04538a5e1cad40ba5586b4b517d3-Paper-Conference.pdf.
  4. A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
  5. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
  6. Towards democratizing joint-embedding self-supervised learning, 2023.
  7. Active self-supervised learning: A few low-cost relationships are all you need, 2023a.
  8. The SSL interplay: Augmentations, inductive bias, and generalization. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  3252–3298. PMLR, 23–29 Jul 2023b. URL https://proceedings.mlr.press/v202/cabannes23a.html.
  9. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
  10. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020.
  11. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  15750–15758, 2021.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  13. With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9588–9597, 2021.
  14. Revisiting hotels-50k and hotel-id. arXiv preprint arXiv:2207.10200, 2022.
  15. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15180–15190, 2023.
  16. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  17. Provable guarantees for self-supervised deep learning with spectral contrastive loss. Advances in Neural Information Processing Systems, 34:5000–5011, 2021.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  19. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9729–9738, 2020.
  20. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  16000–16009, 2022.
  21. Hard negative mixing for contrastive learning. Advances in Neural Information Processing Systems, 33:21798–21809, 2020.
  22. The 2021 hotel-id to combat human trafficking competition dataset. arXiv preprint arXiv:2106.05746, 2021.
  23. Joint embedding self-supervised learning in the kernel regime. arXiv preprint arXiv:2209.14884, 2022.
  24. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  25. Learning multiple layers of features from tiny images. 2009.
  26. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  27. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  28. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  29. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  30. Hotels-50k: A global hotel recognition dataset. arXiv preprint arXiv:1901.11397, 2019.
  31. Exploring the equivalence of siamese self-supervised learning via a unified gradient framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14431–14440, 2022.
  32. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pp.  478–487. PMLR, 2016.
  33. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023.
  34. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pp.  12310–12320. PMLR, 2021.

Summary

  • The paper introduces GPS-SSL, which embeds prior domain knowledge into positive sampling, achieving significant accuracy gains (e.g., CIFAR10 improved from 37.51% to 85.58%).
  • It reduces reliance on heavily tuned data augmentations by using a nearest-neighbor approach in an independent embedding space.
  • GPS-SSL integrates with methods like SimCLR and BYOL, highlighting its potential for robust performance in varied real-world applications.

Overview of Guided Positive Sampling Self-Supervised Learning (GPS-SSL)

Self-Supervised Learning (SSL) represents a vibrant area in the field of machine learning, enabling models to learn meaningful representations from unlabeled data. SSL's effectiveness is normally contingent upon the use of Data-Augmentations (DAs) to create 'positive samples', which are pairs of data that the model learns to recognize as similar. However, identifying the optimal DA can be a daunting task, particularly for lesser-known or specialized datasets, and this is where Guided Positive Sampling Self-Supervised Learning (GPS-SSL) comes into play.

The GPS-SSL Method

GPS-SSL introduces a novel approach to generate positive samples while reducing dependence on heavily tuned DAs. This method creates a new metric space where positive samples are chosen based on nearest-neighbor sampling. A constructed embedding space, independent of specific DAs, embeds prior knowledge about the data domain. Consequently, GPS-SSL serves as an adaptable augmentation to various existing SSL methods such as SimCLR or BYOL.

Key outcomes of integrating GPS-SSL include enhanced model performances on under-studied domains and decreased need for complex DA strategies. For example, when applying GPS-SSL with minimal DAs on a dataset like Cifar10, there was a striking performance leap to 85.58% accuracy, compared to only 37.51% using the baseline method.

Comparison with Other Self-Supervised Learning Methods

The paper contrasts GPS-SSL with several other SSL methods to emphasize its potential benefits. Traditional SSL methods require precise augmentations and are usually pretrained on datasets where these augmentations are well-known. The issue arises when transferring the learned SSL models to atypical datasets where such knowledge is not readily available. GPS-SSL's flexibility becomes apparent here, with its inherent robustness to under-tuned DAs, allowing it to outperform baselines even in situations where data augmentations are suboptimal or unknown.

Real-World Impact on Diverse Datasets

The paper tested GPS-SSL on a variety of datasets, from aircraft and medical images to hotel room photos used in counter human trafficking efforts. Remarkable improvements were seen with GPS-SSL across these domains, including real-world applications, suggesting that this approach offers a significant boost over baseline methods when strong DA recipes are not available.

In conclusion, GPS-SSL seems to pioneer a shift in SSL, directing focus from creating meticulously crafted DAs towards an intricate understanding and utilization of prior knowledge embedding spaces. By effectively embedding semantic relationships into positive sample selection, GPS-SSL streamlines the process of learning representations that are more attuned to the particularities of diverse data domains.

X Twitter Logo Streamline Icon: https://streamlinehq.com