Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning (2003.05438v5)

Published 11 Mar 2020 in cs.CV, cs.LG, and eess.IV

Abstract: The recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations. Making the two views distinctive is a core to guarantee that unsupervised methods can learn meaningful information. However, such frameworks are sometimes fragile on overfitting if the augmentations used for generating two views are not strong enough, causing the over-confident issue on the training data. This drawback hinders the model from learning subtle variance and fine-grained information. To address this, in this work we aim to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs through mixing the input data space, to further work collaboratively for the input and loss spaces. Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space. Extensive experiments are conducted on CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet and standard ImageNet with popular unsupervised methods SimCLR, BYOL, MoCo V1&V2, SwAV, etc. Our proposed image mixture and label assignment strategy can obtain consistent improvement by 1~3% following exactly the same hyperparameters and training procedures of the base methods. Code is publicly available at https://github.com/szq0214/Un-Mix.

Citations (99)

View on Semantic Scholar

Summary

The paper presents Un-Mix, which integrates image mixtures with refined label assignments to mitigate overfitting in unsupervised learning.
It employs global and region-level mixing strategies that adapt similarity measures, yielding fine-grained visual representations across multiple datasets.
The approach consistently improves performance by 1-3% on benchmarks like CIFAR-10, Tiny ImageNet, and ImageNet, demonstrating its practical impact.

Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning

The paper, "Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning," introduces a novel approach aimed at enhancing unsupervised learning frameworks through the integration of image mixtures and label assignment strategies. The core premise of this paper rests on the hypothesis that conventional unsupervised learning models, which rely heavily on a siamese-like framework comparing augmented "views" from a single image, often suffer from overfitting when augmentation is weak. This limitation restricts the model’s ability to learn fine-grained and subtle variance representations.

The authors propose a methodology termed "Un-Mix," involving image mixtures to mediate the similarity degrees between positive and negative pairs in the context of contrastive-based unsupervised learning tasks. The key innovation here lies in employing what is described as a "soft distance concept" within label space and integrating this with a mix of input data to differentially influence both input and loss spaces. The outcome of this approach is empirically verified on various datasets—CIFAR-10, CIFAR-100, STL-10, Tiny ImageNet, and ImageNet—showing improvements in representation learning when using unsupervised methods such as SimCLR, BYOL, and MoCo V1/V2. The Un-Mix strategy yields consistently superior performance, achieving accuracy enhancements between 1% to 3%.

The introduction contextualizes unsupervised visual learning, positioning it at the intersection of maximizing utility from unlabeled data and devising algorithmic structures that transcend traditional supervised learning paradigms. Here, contrastive learning emerges as a significant contender for state-of-the-art results, primarily through synchronization of data augmentation techniques across multiple views enabling strong mutual information capture.

The methodological framework of Un-Mix involves executing mixtures of input space through two primary mechanisms: global mixtures (akin to Mixup) and region-level mixtures (similar to CutMix). These mixtures are juxtaposed with sorptive methods of label smoothing to navigate the propensity toward confident and discrete learning states. By establishing a mixture within mini-batches and adjusting similarity calculations based on mixture proportions, the authors maintain a novel approach to balancing input and label space modulation concurrently.

Subsequent sections detail experimental protocols extensively validating Un-Mix. The improvements in evaluation metrics across diverse datasets and unsupervised models substantiate the approach’s robustness. Notably, Un-Mix delivers compelling improvements within linear classification evaluations and downstream object detection tasks, as demonstrated on PASVAL VOC and COCO benchmarks.

In practice, Un-Mix’s conceptual simplicity, composability with existing frameworks, and marginal computational overhead render it an attractive choice for advancing the field of unsupervised learning. Beyond practical implications, theoretical grounding is offered through the lens of Mutual Information Theory, reinforcing the strategy’s increased dataset relationships and representation refinement.

The presentation of the Un-Mix methodology in this paper holds significant implications for future research endeavors. By embedding transformation within both input data and label characterization, this line of inquiry suggests new directions for interrogating and enhancing model calibration, robustness, and generalizability, especially pertinent as models scale to accommodate more complex variances inherent in real-world data.

In conclusion, the "Un-Mix" paper advances unsupervised representation learning by acting on the constructs of image mixtures and label manipulation. This paves the way for improved fidelity in learned neural representations across a variety of unsupervised paradigms, warranting further exploration and integration within broader AI model architectures.

PDF Markdown

Related Papers

GitHub

GitHub - szq0214/Un-Mix: Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning. (149 stars)