AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation (2106.06250v1)

Published 11 Jun 2021 in cs.CV

Abstract: Most of the achievements in artificial intelligence so far were accomplished by supervised learning which requires numerous annotated training data and thus costs innumerable manpower for labeling. Unsupervised learning is one of the effective solutions to overcome such difficulties. In our work, we propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures. We develop a method to construct the similarities between pictures as distance metrics in the embedding space by leveraging the inter-correlation between augmented versions of samples. Our experiments demonstrate that the method is able to represent the image in low dimensional space and performs competitively in downstream tasks such as image classification and image similarity comparison. Specifically, we achieved over 60% and 27% accuracy on the STL10 and CIFAR100 datasets with unsupervised clustering, respectively. Moreover, unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets to train the feature extractor, but still shows comparable or even better feature representation ability and easy-to-use characteristics. In our evaluations, the method outperforms all the state-of-the-art image retrieval algorithms on some out-of-domain image datasets. The code for the model implementation is available at https://github.com/chenmingxiang110/AugNet.

Authors (7)

Mingxiang Chen (8 papers)
Zhanguo Chang (1 paper)
Haonan Lu (35 papers)
Bitao Yang (1 paper)
Zhuang Li (69 papers)
Liufang Guo (1 paper)
Zhecheng Wang (20 papers)

Citations (9)

View on Semantic Scholar

Summary

The paper introduces AugNet, an unsupervised learning framework that uses robust image augmentation and contrastive loss to derive powerful visual embeddings.
It employs self-supervised techniques with methods like rotation, cropping, and color adjustments to enrich the training process without manual labels.
Experimental results on benchmarks like STL-10 and CIFAR datasets demonstrate competitive accuracy and improved image retrieval in out-of-domain scenarios.

Overview of "AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation"

The paper introduces AugNet, an innovative approach to unsupervised visual representation learning, leveraging image augmentation techniques. Traditional supervised learning methods in computer vision require extensive annotated datasets, which is a labor-intensive and costly process. By contrast, AugNet addresses this challenge by enabling the learning of image features from unlabelled data, a method that significantly reduces the reliance on labeled datasets.

Methodology

AugNet represents a self-supervised learning paradigm, focusing on the inter-correlation between augmented images to create a robust image embedding space. This method involves several key components:

Augmentation Strategy: A range of augmentation techniques is applied to the images, including rotation, noise addition, cropping, resolution changes, and color adjustments. This generates different views of the same image, thereby enriching the training dataset without additional labels.
Contrastive Loss Function: The authors adopt a contrastive loss function instead of the more traditional softmax loss, showing significant advantages in performance. The contrastive loss ensures that augmented images from the same original are represented closely in the embedding space, while those from different origins are distant.
Embedding Procedure: The model processes augmented images through a deep convolutional neural network to derive low-dimensional vectors. Here, ensuring that corresponding vectors for similar images are closer in the feature space is critical for clustering and retrieval tasks.

The paper incorporates extensive experimentation, varying network depths and the specific augmentation methods to evaluate the performance improvements.

Experimental Results

On benchmark datasets such as STL-10, CIFAR-10, and CIFAR-100, AugNet demonstrated competitive accuracy, rivaling state-of-the-art algorithms in unsupervised learning. For image retrieval tasks, the method showed promising results, particularly in datasets where pre-trained models struggle due to domain differences. It notably enhances retrieval effectiveness by outperforming traditional methods in out-of-domain conditions such as anime character illustrations and human sketches.

Implications and Future Work

The implications of AugNet extend to various practical uses in computer vision, chiefly in scenarios where label data is scarce or unavailable. The theoretical implications suggest that self-supervised approaches like AugNet can significantly bridge the gap between supervised and unsupervised representation learning, narrowing the performance discrepancy traditionally seen in these methods.

The potential next steps for this line of research may include expanding the model’s applicability to different data forms like videos, exploring its capability in tasks such as object detection and segmentation, and refining the augmentation strategies to enhance robustness against diverse data distributions.

In conclusion, AugNet presents a practical solution for advancing unsupervised learning in computer vision, emphasizing the importance of leveraging augmentation techniques in feature learning. This research paves the way for further explorations in reducing the dependence on large labeled datasets, which could fundamentally reshape the landscape of machine learning in visual tasks.

Related Papers

GitHub

GitHub - chenmingxiang110/AugNet: The AugNet Python module contains functions for the fast computation of image similarity. (91 stars)