Improved Baselines with Momentum Contrastive Learning (2003.04297v1)

Published 9 Mar 2020 in cs.CV

Abstract: Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR's design improvements by implementing them in the MoCo framework. With simple modifications to MoCo---namely, using an MLP projection head and more data augmentation---we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. Code will be made public.

Authors (4)

Xinlei Chen (106 papers)
Haoqi Fan (33 papers)
Ross Girshick (75 papers)
Kaiming He (71 papers)

Citations (3,192)

View on Semantic Scholar

Summary

Improved Baselines with Momentum Contrastive Learning

The paper "Improved Baselines with Momentum Contrastive Learning," authored by Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He of Facebook AI Research (FAIR), presents notable advancements in the field of contrastive unsupervised learning. This research centers on refining the Momentum Contrast (MoCo) framework by incorporating two critical elements from the SimCLR paradigm: an MLP projection head and enhanced data augmentation techniques. The enhancements introduced here demonstrate superior performance in image classification and object detection tasks relative to existing benchmarks.

Introduction

The primary focus of the paper is the integration of elements from SimCLR into the MoCo framework, providing valuable insights into their orthogonal nature. The underlying objective is to achieve robust unsupervised pre-training without necessitating large training batches. The modifications yield improved baseline models, labeled as "MoCo v2," which outperform SimCLR baselines while maintaining the efficiency synonymous with the MoCo framework.

Background

Contrastive learning, a technique that learns from similar and dissimilar pairs of data, is a core component of the paper. The contrastive loss function InfoNCE, as outlined in prior works, serves as the foundation for both MoCo and SimCLR mechanisms. MoCo introduces a queue-based system to maintain negative samples, thereby decoupling the batch size from the number of negatives. In contrast, SimCLR relies on large end-to-end batches to source negative samples, necessitating extensive computational resources.

Improved Designs

Two specific improvements are studied within the MoCo framework:

MLP Projection Head: The transition from a linear fully connected (fc) projection head to a multi-layer perceptron (MLP) head was found to notably enhance performance. This change was validated by experimenting with different values of the temperature parameter $\tau$ , with optimal performance observed at $\tau = 0.2$ .
Enhanced Data Augmentation: Incorporating additional data augmentation techniques, particularly blur augmentation, resulted in significant performance gains. These augmentations, when paired with the MLP head, realized a substantial increase in ImageNet classification accuracy.

Experiments

The research encapsulates detailed experimental observations, comparing MoCo v1 and v2, and SimCLR across various metrics:

ImageNet Linear Classification: Through the linear classification protocol, MoCo v2 demonstrated a 5.6% higher top-1 accuracy than SimCLR under identical conditions (200 epochs, batch size 256), achieving 67.5%.
VOC Object Detection Transfer Learning: For object detection tasks utilizing a Faster R-CNN detector, MoCo v2 showed incremental improvements, solidifying its efficacy across different domains.

Computational Efficiency

One of the critical takeaways from the paper is the computational efficiency of MoCo v2. The assessments on memory and time costs reveal that MoCo’s queue-based architecture is significantly leaner compared to SimCLR's large batch requirement, making high-performance unsupervised learning more accessible.

Implications and Future Work

The findings have substantive implications for both practical applications and theoretical advancements in unsupervised learning. Practically, the ability to achieve superior performance without large batch sizes democratizes access to advanced pre-training techniques. Theoretically, the demonstrated orthogonality between different components of contrastive learning frameworks suggests a modular approach can yield further enhancements.

Future research could explore additional architectural modifications, various augmentation strategies, and fine-tuning of hyperparameters, aiming to further close the performance gap between supervised and unsupervised learning models.

Conclusion

The advancements presented in "Improved Baselines with Momentum Contrastive Learning" offer a substantial contribution to the field of unsupervised representation learning. By integrating elements from SimCLR into the MoCo framework, the authors have established new, accessible benchmarks that hold promise for future research and practical applications in AI and computer vision.

Related Papers

YouTube

Show All Videos