Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination (2008.03813v5)

Published 9 Aug 2020 in cs.CV, cs.LG, and stat.ML

Abstract: Unsupervised feature learning has made great strides with contrastive learning based on instance discrimination and invariant mapping, as benchmarked on curated class-balanced datasets. However, natural data could be highly correlated and long-tail distributed. Natural between-instance similarity conflicts with the presumed instance distinction, causing unstable training and poor performance. Our idea is to discover and integrate between-instance similarity into contrastive learning, not directly by instance grouping, but by cross-level discrimination (CLD) between instances and local instance groups. While invariant mapping of each instance is imposed by attraction within its augmented views, between-instance similarity could emerge from common repulsion against instance groups. Our batch-wise and cross-view comparisons also greatly improve the positive/negative sample ratio of contrastive learning and achieve better invariant mapping. To effect both grouping and discrimination objectives, we impose them on features separately derived from a shared representation. In addition, we propose normalized projection heads and unsupervised hyper-parameter tuning for the first time. Our extensive experimentation demonstrates that CLD is a lean and powerful add-on to existing methods such as NPID, MoCo, InfoMin, and BYOL on highly correlated, long-tail, or balanced datasets. It not only achieves new state-of-the-art on self-supervision, semi-supervision, and transfer learning benchmarks, but also beats MoCo v2 and SimCLR on every reported performance attained with a much larger compute. CLD effectively brings unsupervised learning closer to natural data and real-world applications. Our code is publicly available at: https://github.com/frank-xwang/CLD-UnsupervisedLearning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xudong Wang (113 papers)
  2. Ziwei Liu (368 papers)
  3. Stella X. Yu (65 papers)
Citations (73)

Summary

Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination

The paper "Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination" introduces an innovative approach to unsupervised feature learning by integrating instance-group discrimination into the existing framework of contrastive learning. This method, termed Cross-Level Discrimination (CLD), addresses significant challenges faced in unsupervised learning, especially in contexts involving highly correlated and long-tailed data distributions.

Fundamental Premise and Methodology

The key insight of the approach is to move beyond conventional contrastive learning, which typically emphasizes instance-level discrimination and invariant mapping. While effective on curated datasets, these methods falter when natural between-instance similarities occur, such as duplicate frames in sequential images or multiple views of the same object. The notion of discriminating every instance distinctly leads to unstable training due to non-reflective assumptions on data distribution.

CLD seeks to incorporate between-instance similarities by imposing discrimination not just between individual instances but across varying abstraction levels — specifically, between individual instances and localized instance groups. This is achieved by:

  • Introducing a batch-wise, cross-view contrastive comparison to enhance the positive-negative sample ratio beneficially for invariant mapping.
  • Imposing grouping and discrimination objectives on features derived from a shared representation, improving the model's resistance to natural data variations.

The method utilizes normalized projection heads and proposes a novel strategy for unsupervised hyper-parameter tuning, which further fine-tunes the learning process by stabilizing feature extraction across views.

Numerical and Empirical Findings

The effectiveness of the proposed CLD method is demonstrated extensively across a range of datasets, including highly correlated, long-tailed, and standard balanced datasets. The paper documents state-of-the-art performance achievements in unsupervised learning benchmarks such as NPID, MoCo, InfoMin, and BYOL, illustrating tangible improvements regarding:

  • Top-1 and Top-5 classification accuracy across various datasets.
  • Class separation and retrieval accuracy exemplified using metrics such as Normalized Mutual Information (NMI) and retrieval accuracy.

The method notably outperforms existing models like MoCo v2 and SimCLR, showcasing a significant boost in model performance even without resorting to computationally expensive resources typical of SOTA methods.

Implications and Future Directions

The CLD approach significantly bridges the gap between contrastive learning and practical unsupervised learning applications by enabling feature extraction models to be more responsive to the inherent distribution of real-world data. This enhancement has broad implications for deployment in domains such as video processing, where data correlations are more pronounced.

Future developments could explore extending these principles to more complex data structures beyond images, such as temporal sequences and multi-modal data. Further improvement in model performance might benefit from more dynamic group formation strategies and adaptive optimization procedures.

In sum, the introduction of cross-level discrimination marks a substantial advance for unsupervised feature learning, not only in increasing robustness but also by providing a stepping stone towards more adaptive, natural data-aligned learning frameworks.