Model-Aware Contrastive Learning: Towards Escaping the Dilemmas (2207.07874v4)
Abstract: Contrastive learning (CL) continuously achieves significant breakthroughs across multiple domains. However, the most common InfoNCE-based methods suffer from some dilemmas, such as \textit{uniformity-tolerance dilemma} (UTD) and \textit{gradient reduction}, both of which are related to a $\mathcal{P}_{ij}$ term. It has been identified that UTD can lead to unexpected performance degradation. We argue that the fixity of temperature is to blame for UTD. To tackle this challenge, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task, then enables CL loss to adjust the penalty strength for hard negatives adaptively. Regarding another dilemma, the gradient reduction issue, we derive the limits of an involved gradient scaling factor, which allows us to explain from a unified perspective why some recent approaches are effective with fewer negative samples, and summarily present a gradient reweighting to escape this dilemma. Extensive remarkable empirical results in vision, sentence, and graph modality validate our approach's general improvement for representation learning and downstream tasks.
- Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. In CVPR, pp. 9902–9912, 2022.
- Vicreg: Variance-invariance-covariance regularization for self-supervised learning. ICLR, 2022.
- Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828, 2013.
- Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS, 33:9912–9924, 2020.
- Simpler, faster, stronger: Breaking the log-k curse on contrastive learners with flatnce. arXiv preprint arXiv:2107.01152, 2021a.
- Are powerful graph neural nets necessary? a dissection on graph classification. arXiv preprint arXiv:1905.04579, 2019.
- A simple framework for contrastive learning of visual representations. In ICML, pp. 1597–1607, 2020a.
- Big self-supervised models are strong semi-supervised learners. NeurIPS, pp. 22243–22255, 2020b.
- Exploring simple siamese representation learning. In CVPR, pp. 15750–15758, 2021.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020c.
- An empirical study of training self-supervised vision transformers. In ICCV, pp. 9640–9649, 2021b.
- Debiased contrastive learning. NeurIPS, 33:8765–8775, 2020.
- Contributors, M. MMSelfSup: Openmmlab self-supervised learning toolbox and benchmark. https://github.com/open-mmlab/mmselfsup, 2021.
- Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255, 2009.
- Discriminative unsupervised feature learning with convolutional neural networks. In NeurIPS, 2014.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
- With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In ICCV, pp. 9588–9597, 2021.
- The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–308, 2010.
- Simcse: Simple contrastive learning of sentence embeddings. In EMNLP, pp. 6894–6910, 2021.
- Unsupervised representation learning by predicting image rotations. In ICLR, 2018.
- Bootstrap your own latent-a new approach to self-supervised learning. In NeurIPS, pp. 21271–21284, 2020.
- Deep residual learning for image recognition. In CVPR, pp. 770–778, 2016.
- Mask r-cnn. In ICCV, pp. 2961–2969, 2017.
- Momentum contrast for unsupervised visual representation learning. In CVPR, pp. 9729–9738, 2020.
- Masked autoencoders are scalable vision learners. In CVPR, pp. 16000–16009, 2022.
- Strategies for pre-training graph neural networks. In ICLR, 2020.
- Unsupervised deep learning by neighbourhood discovery. In ICML, pp. 2849–2858, 2019.
- Boosting contrastive self-supervised learning with false negative cancellation. In WACV, pp. 2785–2795, 2022.
- Understanding dimensional collapse in contrastive self-supervised learning. ICLR, 2022.
- Hard negative mixing for contrastive learning. NeurIPS, pp. 21798–21809, 2020.
- Dense passage retrieval for open-domain question answering. In EMNLP, pp. 6769–6781, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pp. 4171–4186, 2019.
- Dynamic temperature scaling in contrastive self-supervised learning for sensor-based human activity recognition. IEEE Transactions on Biometrics, Behavior, and Identity Science, pp. 1–8, 2022.
- Learning multiple layers of features from tiny images. 2009.
- Temperature schedules for self-supervised contrastive methods on long-tail data. In ICLR, 2023.
- Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966, 2020.
- Let invariant rationale discovery inspire graph contrastive learning. In ICML, pp. 13052–13065, 2022.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Tudataset: A collection of benchmark datasets for learning with graphs. In ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), 2020.
- Learning transferable visual models from natural language supervision. In ICML, pp. 8748–8763, 2021.
- Contrastive learning with hard negative samples. ICLR, 2021.
- Max-margin contrastive learning. In AAAI, pp. 8220–8230, 2022.
- Tian, Y. Deep contrastive learning is provably (almost) principal component analysis. arXiv preprint arXiv:2201.12680, 2022.
- Contrastive multiview coding. In ECCV, pp. 776–794, 2020a.
- What makes for good views for contrastive learning? NeurIPS, 33:6827–6839, 2020b.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Attention is all you need. In NeurIPS, 2017.
- Understanding the behaviour of contrastive loss. In CVPR, pp. 2495–2504, 2021.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML, pp. 9929–9939, 2020.
- Dense contrastive learning for self-supervised visual pre-training. In CVPR, pp. 3024–3033, 2021.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018a.
- Unsupervised feature learning via non-parametric instance discrimination. In CVPR, pp. 3733–3742, 2018b.
- Unsupervised embedding learning via invariant and spreading instance feature. In CVPR, pp. 6210–6219, 2019.
- Decoupled contrastive learning. In ECCV, pp. 668–684, 2022.
- Graph contrastive learning with augmentations. In NeurIPS, pp. 5812–5823, 2020.
- Graph contrastive learning automated. In ICML, pp. 12121–12132, 2021.
- Barlow twins: Self-supervised learning via redundancy reduction. In ICML, pp. 12310–12320, 2021.
- Dual temperature helps contrastive learning without many negative samples: Towards understanding and simplifying moco. In CVPR, pp. 14441–14450, 2022.
- Temperature as uncertainty in contrastive learning. arXiv preprint arXiv:2110.04403, 2021.