Contrastive Learning with Stronger Augmentations (2104.07713v2)

Published 15 Apr 2021 in cs.CV, cs.AI, and cs.LG

Abstract: Representation learning has significantly been developed with the advance of contrastive learning methods. Most of those methods have benefited from various data augmentations that are carefully designated to maintain their identities so that the images transformed from the same instance can still be retrieved. However, those carefully designed transformations limited us to further explore the novel patterns exposed by other transformations. Meanwhile, as found in our experiments, the strong augmentations distorted the images' structures, resulting in difficult retrieval. Thus, we propose a general framework called Contrastive Learning with Stronger Augmentations~(CLSA) to complement current contrastive learning approaches. Here, the distribution divergence between the weakly and strongly augmented images over the representation bank is adopted to supervise the retrieval of strongly augmented queries from a pool of instances. Experiments on the ImageNet dataset and downstream datasets showed the information from the strongly augmented images can significantly boost the performance. For example, CLSA achieves top-1 accuracy of 76.2% on ImageNet with a standard ResNet-50 architecture with a single-layer classifier fine-tuned, which is almost the same level as 76.5% of supervised results. The code and pre-trained models are available in https://github.com/maple-research-lab/CLSA.

PDF Abstract

Contrastive Learning with Stronger Augmentations: An Overview

The paper "Contrastive Learning with Stronger Augmentations," authored by Xiao Wang and Guo-Jun Qi, presents an innovative approach to enhance the performance of contrastive learning methods. This is achieved through the integration of stronger data augmentations, a concept which departs from the traditionally cautious application of transformations that preserve image identity.

Conceptual Framework

Contrastive learning has emerged as a powerful methodology within unsupervised representation learning, inspired by the necessity to reduce reliance on extensive labeled datasets. The core idea revolves around mapping instances such that different views of the same instance pull together in the feature space, whereas views from different instances push apart. This is traditionally facilitated through transformations that ensure augmented versions of an image remain recognizable instances of the original.

The authors challenge this conventional approach by proposing a framework called Contrastive Learning with Stronger Augmentations (CLSA). The fundamental departure here is the use of aggressive image transformations that introduce significant distortions, which typically make retrieval of original instance identity challenging. CLSA leverages the information embedded in these distortions to potentially capture novel patterns beneficial for self-supervised learning.

Methodological Innovations

CLSA introduces a crucial component, the Distributional Divergence Minimization (DDM), which mediates between weakly and strongly augmented image representations. This process circumvents the potential issue of losing instance-specific information by supervising the retrieval of strongly augmented queries through distribution divergence over a representation bank. The model's robustness is enhanced by simultaneously optimizing a distributional loss alongside the traditional contrastive loss.

The stronger augmentation pipeline, inspired by automated augmentation strategies like RandAugment, stochastically combines multiple transformations such as rotation, inversion, and solarization. This strategy introduces rich distortions that, under the proposed DDM framework, do not diminish retrieval success but rather augment the model's ability to generalize across varied data instantiations.

Empirical Evaluation

The efficacy of CLSA is demonstrated through experiments on the ImageNet dataset, where it achieves a top-1 accuracy of 76.2% with the ResNet-50 architecture—a performance nearly equivalent to fully supervised models. Furthermore, CLSA shows competitive results on downstream tasks such as VOCO7, where it achieves a top-1 accuracy of 93.6%.

The ablation studies substantiate the utility of stronger augmentations in elevating model performance. When DDM is compared against purely contrastive approaches with stronger augmentations, it becomes evident that the latter without DDM often results in performance stagnation or degradation. Notably, DDM is shown to transfer knowledge effectively from weakly augmented views to compensate for the distortions introduced by strong augmentations, thereby catalyzing improved feature discrimination capability.

Theoretical and Practical Implications

The introduction of CLSA marks a meaningful improvement in self-supervised learning paradigms, demonstrating how strong augmentations, typically viewed as detrimental to instance identity, can be harnessed constructively with appropriate design adjustments like DDM. This approach has implications for extending contrastive learning methodologies beyond natural images to domains where image quality and structure may be inherently variable.

Practically, the CLSA framework offers a blueprint for enhancing existing contrastive learning models. It can seamlessly integrate with popular methods like MoCo and SimCLR, suggesting a path forward for more robust, scalable unsupervised learning systems.

Future Directions

The success of CLSA opens several avenues for future research. The exploration of optimal augmentation strategies in different data contexts, the tuning of distributional loss parameters, and the application of CLSA across more diverse datasets could yield further insights and improvements in representation learning models. Additionally, extending the framework's applicability to semi-supervised settings could unify the advantages of supervised and unsupervised learning paradigms, potentially revolutionizing how models are trained within resource-constrained environments.

In summary, CLSA introduces a paradigm shift in how augmentations can be capitalized to improve contrastive learning. Its effectiveness in achieving high performance underscores the potential of stronger augmentations, guided by distributional supervision, to offer substantial advancements in unsupervised representation learning.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Xiao Wang (507 papers)
Guo-Jun Qi (76 papers)

Citations (192)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - maple-research-lab/CLSA: official implemntation for "Contrastive Learning with Stronger Augmentations" (57 stars)