Delving into Inter-Image Invariance for Unsupervised Visual Representations (2008.11702v3)

Published 26 Aug 2020 in cs.CV and cs.LG

Abstract: Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies in this track mainly focus on intra-image invariance learning. The learning typically uses rich intra-image transformations to construct positive pairs and then maximizes agreement using a contrastive loss. The merits of inter-image invariance, conversely, remain much less explored. One major obstacle to exploit inter-image invariance is that it is unclear how to reliably construct inter-image positive pairs, and further derive effective supervision from them since no pair annotations are available. In this work, we present a comprehensive empirical study to better understand the role of inter-image invariance learning from three main constituting components: pseudo-label maintenance, sampling strategy, and decision boundary design. To facilitate the study, we introduce a unified and generic framework that supports the integration of unsupervised intra- and inter-image invariance learning. Through carefully-designed comparisons and analysis, multiple valuable observations are revealed: 1) online labels converge faster and perform better than offline labels; 2) semi-hard negative samples are more reliable and unbiased than hard negative samples; 3) a less stringent decision boundary is more favorable for inter-image invariance learning. With all the obtained recipes, our final model, namely InterCLR, shows consistent improvements over state-of-the-art intra-image invariance learning methods on multiple standard benchmarks. We hope this work will provide useful experience for devising effective unsupervised inter-image invariance learning. Code: https://github.com/open-mmlab/mmselfsup.

Authors (5)

Jiahao Xie (22 papers)
Xiaohang Zhan (27 papers)
Ziwei Liu (368 papers)
Yew Soon Ong (30 papers)
Chen Change Loy (288 papers)

Citations (56)

View on Semantic Scholar

Summary

Delving into Inter-Image Invariance for Unsupervised Visual Representations

The paper "Delving into Inter-Image Invariance for Unsupervised Visual Representations" explores the relatively under-investigated domain of inter-image invariance learning as applied to unsupervised contrastive learning. Aimed at addressing the challenges of exploiting inter-image representations without explicit pair annotations, the authors propose a unified framework dubbed InterCLR, which integrates both intra-image and inter-image invariance learning strategies.

Key Contributions

The paper systematically investigates inter-image invariance through three core components: pseudo-label maintenance, sampling strategy, and decision boundary design. Each aspect is meticulously studied to enhance inter-image learning's effectiveness in contrastive frameworks.

Pseudo-Label Maintenance: Traditional methods predominantly employ offline cluster assignments that are computationally demanding, resulting in stale labels. The authors propose an online mini-batch $k$ -means clustering method, allowing labels to be updated iteratively and synchronously with network training, which helps maintain label reliability and convergence speed.
Sampling Strategy: Recognizing the challenge of sampling for unsupervised learning, the paper introduces a semi-hard negative sampling strategy. This approach balances the presence of misleading hard negatives and the ineffectiveness of easy negatives, promoting more reliable and unbiased learning processes.
Decision Boundary Design: The paper highlights the importance of margin settings in loss functions, adapting MarginNCE to suit both intra- and inter-image consistency branches. They find that intra-image learning benefits from stricter margins, whereas inter-image learning, given its inherent noise, requires more lenient margins.

Empirical Validation

InterCLR exhibits substantial improvements over established intra-image invariance learning methods when transferred to various downstream tasks, such as image classification, low-shot classification, semi-supervised learning, and object detection. Notably, it outperforms in efficiency and effectiveness compared to its baselines, pre-trained within a constrained computational budget.

Implications and Future Directions

The research underlines the potential of leveraging inter-image invariance in advancing unsupervised representation learning. By effectively navigating challenges around label dynamics and sampling strategy, it sets the stage for future explorations in:

Extending InterCLR's framework to other modalities or dense prediction tasks.
Investigating more sophisticated clustering techniques or adaptive margin strategies.
Evaluating performance under varying resource constraints, particularly in batch size and epochs, which is crucial for scalability and generalization.

The findings provide pivotal insights into refining contrastive learning approaches, ensuring a balanced integration of intra- and inter-image statistics to foster robust and generalizable representations. Given the rapid advancements in AI, such comprehensive studies form the foundation for more resilient and efficient unsupervised learning frameworks in practice.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - open-mmlab/mmselfsup: OpenMMLab Self-Supervised Learning Toolbox and Benchmark (3,167 stars)