Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks (1406.6909v2)

Published 26 Jun 2014 in cs.LG, cs.CV, and cs.NE

Abstract: Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled 'seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

Citations (1,008)

View on Semantic Scholar

Summary

The paper demonstrates that unsupervised CNN training using surrogate classes yields robust classification and descriptor matching performance.
Methodology leverages diverse image transformations to generate surrogate labels that help the network learn invariant and discriminative features.
Exemplar-CNNs outperform traditional approaches like SIFT and previous unsupervised methods, achieving state-of-the-art accuracy on datasets like STL-10 and Caltech-101.

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

Abstract

The paper "Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks" by Alexey Dosovitskiy et al. presents a methodology for training Convolutional Neural Networks (CNNs) using unlabeled data. The approach leverages surrogate classes formed by applying various transformations to randomly sampled image patches. The resultant feature representation is assessed for its robustness and effectiveness across several image classification and descriptor matching tasks.

Methodology

The core of this approach is to train a CNN on data without class labels by generating surrogate classes through data augmentation. Each surrogate class consists of transformations applied to an initial seed image patch. The transformations include various image manipulations such as translation, rotation, scaling, contrast adjustments, and color variations. This process ensures that the network learns to discriminate between different classes while being invariant to specific transformations.

The network architectures evaluated range from relatively small models like the 64c5-64c5-128f to larger models such as the 92c5-256c5-512c5-1024f, with layer names indicating configuration details (number of filters and filter sizes). The optimization employs stochastic gradient descent with momentum, and dropout is applied to fully connected layers to prevent overfitting.

Experimental Results

The paper presents substantial evidence of the efficacy of the proposed method across multiple datasets: STL-10, CIFAR-10, Caltech-101, and Caltech-256. Notably, the Exemplar-CNNs consistently outperform previous unsupervised learning methods, achieving superior classification accuracy. For instance, a classification accuracy of 74.2% on STL-10 and 87.1% on Caltech-101 represents a significant improvement over the state-of-the-art in unsupervised feature learning.

An interesting aspect highlighted is the robust performance of these generic features on geometric matching problems. The Exemplar-CNN's features were shown to outperform the SIFT descriptor, demonstrating a significant capability for descriptor matching tasks. Incorporating additional transformations like blur during training further enhances performance on image matching tasks involving blurred images, confirming the flexible utility of the approach.

Numerical Highlights

Classification Accuracy: Exemplar-CNN achieves 74.2% on STL-10 and 87.1% on Caltech-101.
Performance on Descriptor Matching: Features from Exemplar-CNN outperform SIFT and those obtained from AlexNet trained on ImageNet.

Theoretical and Practical Implications

The theoretical implications of this paper suggest that discriminative objectives can be highly effective for unsupervised learning, particularly when combined with well-selected data augmentations to generate surrogate labels. This approach allows for the learning of feature representations that can generalize and perform well across varying types of tasks, including those that highly diverge from the original learning task.

Practically, the methodology provides a robust framework for scenarios where labeled data is scarce or expensive to obtain. It offers a path forward for developing efficient models capable of performing comparably to supervised models in several domains, which is especially advantageous in real-world applications where class labels are often unavailable.

Future Developments

Future work could explore combining this approach with semi-supervised learning techniques or further optimizing the types and magnitudes of transformations used to create surrogate labels. Another potential avenue is integrating clustering techniques within the framework to dynamically adjust and refine surrogate classes during training. Additionally, advancements can be made in extending the models to handle more complex and varied data scenarios, potentially enhancing their robustness and performance further.

Conclusion

The paper by Dosovitskiy et al. makes a significant contribution to the domain of unsupervised feature learning with CNNs. By leveraging surrogate classes formed through diverse transformations, the approach successfully trains networks to learn invariant and discriminative features. The experimental success across multiple datasets and tasks highlights the model's robustness and flexibility, positioning it as a strong alternative to traditional supervised learning methodologies in many contexts.

PDF Markdown

Related Papers

YouTube

Show All Videos