Decoupled Contrastive Learning (2110.06848v3)

Published 13 Oct 2021 in cs.LG and cs.CV

Abstract: Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented "views" of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, including large sample batches, extensive training epochs, etc. We are thus motivated to tackle these issues and establish a simple, efficient, yet competitive baseline of contrastive learning. Specifically, we identify, from theoretical and empirical studies, a noticeable negative-positive-coupling (NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning efficiency concerning the batch size. By removing the NPC effect, we propose decoupled contrastive learning (DCL) loss, which removes the positive term from the denominator and significantly improves the learning efficiency. DCL achieves competitive performance with less sensitivity to sub-optimal hyperparameters, requiring neither large batches in SimCLR, momentum encoding in MoCo, or large epochs. We demonstrate with various benchmarks while manifesting robustness as much less sensitive to suboptimal hyperparameters. Notably, SimCLR with DCL achieves 68.2% ImageNet-1K top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its SimCLR baseline by 6.4%. Further, DCL can be combined with the SOTA contrastive learning method, NNCLR, to achieve 72.3% ImageNet-1K top-1 accuracy with 512 batch size in 400 epochs, which represents a new SOTA in contrastive learning. We believe DCL provides a valuable baseline for future contrastive SSL studies.

References (2)

Authors (6)

Chun-Hsiao Yeh (7 papers)
Cheng-Yao Hong (2 papers)
Yen-Chi Hsu (2 papers)
Tyng-Luh Liu (21 papers)
Yubei Chen (32 papers)
Yann LeCun (173 papers)

Citations (165)

View on Semantic Scholar

Summary

Decoupled Contrastive Learning: Advances in Self-Supervised Representation Learning

The paper "Decoupled Contrastive Learning" presents a novel approach to address computational inefficiencies in contrastive learning (CL), a dominant paradigm in self-supervised learning (SSL). This method introduces the Decoupled Contrastive Learning (DCL) loss, which aims to enhance learning efficiency by mitigating the negative-positive-coupling (NPC) effect inherent in the popular InfoNCE loss used in CL approaches like SimCLR and MoCo. The proposed DCL loss contributes to reducing the sensitivity of the method to hyperparameter choices such as batch sizes and training epochs, delivering competitive results without demanding excessive computational resources.

In contrastive learning, distinct "views" or augmentations of the same image are pulled together in the embedding space, while others are pushed apart. This requires balancing significant computational demands like large batch sizes and numerous training epochs. The authors identify the NPC effect as a pivotal inefficiency in InfoNCE loss, which couples positive and negative sample influences on the gradient in a manner that reduces the learning signal when batch sizes are small or when certain conditions make the learning task easier. By eliminating this coupling from the gradient calculations, DCL achieves improvements in computational efficiency and model robustness.

Empirical evaluation showcases DCL's ability to maintain high performance with smaller batch sizes and less extensive training regimes. Specifically, DCL enhances SimCLR's top-1 accuracy on ImageNet-1K from 61.8% to 65.9% using a batch size of 256, outperforming the baseline. Furthermore, when integrated with advanced CL methods like NNCLR, DCL sets a new state-of-the-art accuracy of 72.3% on ImageNet-1K in 400 epochs with a batch size of 512. This represents a substantial improvement, highlighting the potential of DCL to streamline computational resources in SSL.

This research not only provides a compelling baseline for future investigations in contrastive SSL but also extends its applicability across various scales and domains, as demonstrated in further experiments with smaller datasets like CIFAR10, CIFAR100, and STL10. Significantly, the authors show that DCL can effectively combine with state-of-the-art SSL methods such as BYOL, demonstrating improved training stability and feature representation quality.

The exploration into DCL also opens up intriguing possibilities for its adaptation in neural network domains beyond vision, such as adaptively enhancing models like wav2vec 2.0 in speech processing. The proposed adjustments to the conventional CL framework, therefore, reinforce its utility and potential as a fundamental tool in advancing self-supervised representation learning in a more resource-efficient manner.

In summary, the paper presents a thoughtfully designed modification to contrastive learning frameworks, offering enhancements in training under constrained computational settings while preserving the efficacy of learned representations across diverse datasets and methodologies. The discussion of theoretical underpinnings alongside robust empirical validation fosters confidence in the DCL approach, with speculative implications suggesting a bright future for its integration and expansion in SSL research trajectories.

PDF Markdown

Related Papers

Tweets

https://twitter.com/rynduma/status/1815465286927348183