S4L: Self-Supervised Semi-Supervised Learning

Published 9 May 2019 in cs.CV and cs.LG | (1905.03670v2)

Abstract: This work tackles the problem of semi-supervised learning of image classifiers. Our main insight is that the field of semi-supervised learning can benefit from the quickly advancing field of self-supervised visual representation learning. Unifying these two approaches, we propose the framework of self-supervised semi-supervised learning and use it to derive two novel semi-supervised image classification methods. We demonstrate the effectiveness of these methods in comparison to both carefully tuned baselines, and existing semi-supervised learning methods. We then show that our approach and existing semi-supervised methods can be jointly trained, yielding a new state-of-the-art result on semi-supervised ILSVRC-2012 with 10% of labels.

Abstract PDF Upgrade to Chat

Citations (752)

View on Semantic Scholar

Summary

The paper proposes the S4L framework that integrates self-supervised tasks, such as Rotation and Exemplar, with semi-supervised learning to enhance image classification.
It demonstrates that joint training with existing semi-supervised methods achieves state-of-the-art performance, reaching 83.82% top-5 accuracy on ILSVRC-2012 with only 10% labels.
The approach improves transfer learning by producing generalizable representations, leading to faster convergence and higher accuracy on datasets like Places205.

Self-Supervised Semi-Supervised Learning ( $S^4L$ ) for Image Classification

This paper introduces a novel framework termed Self-Supervised Semi-Supervised Learning (S $^4$ L) aimed at enhancing the field of semi-supervised learning in image classification. The central premise of the work is to leverage advancements in self-supervised visual representation learning to improve the efficacy of semi-supervised learning methods. The authors propose two new semi-supervised image classification techniques under the S $^4$ L framework and benchmark their performance against rigorously tuned baselines and extant semi-supervised methods. The authors then illustrate that joint training of S $^4$ L techniques with existing semi-supervised methods yields new state-of-the-art results on the ILSVRC-2012 dataset with only 10% of the labels.

Core Contributions

The primary contributions of the paper can be summarized as follows:

Framework Proposal: Introduction of a new family of techniques that bridge self-supervised and semi-supervised learning methods to form the S $^4$ L framework.
Novel Techniques: Derivation of new semi-supervised image classification methods, specifically S $^4$ L-Rotation and S $^4$ L-Exemplar.
Benchmarking Against Strong Baselines: Comprehensive evaluation of these methods against carefully curated and finely tuned baselines.
Joint Training for Enhanced Results: Demonstration that S $^4$ L methods, when trained alongside existing semi-supervised methods, lead to superior performance, setting a new state-of-the-art on the semi-supervised ILSVRC-2012 benchmark.
Transfer Learning Insights: Evaluation of the general usefulness of the learned representations through transfer learning experiments on the Places205 dataset.

Methodology

The paper primarily tackles the semi-supervised image classification problem by utilizing both labeled (D_l) and unlabeled (D_u) data. The proposed learning objective is:

$\min\limits_\theta \; \mathcal{L}_l(D_l, \theta) + w \mathcal{L}_u(D_u, \theta),$

where $\mathcal{L}_l$ is the cross-entropy classification loss for labeled data, $\mathcal{L}_u$ is a self-supervised loss applied to unlabeled data, and $w$ is a balancing weight.

S $^4$ L-Rotation

This method builds on the rotation self-supervision task where images are rotated by 0°, 90°, 180°, and 270°, and the model is then trained to predict the correct rotation. This technique not only enhances the classification performance but also provides a form of data augmentation.

S $^4$ L-Exemplar

This method extends the exemplar self-supervision task where multiple transformed versions of each image are generated, and the model is trained to recognize these transformations. This encourages invariance to various perturbations.

Evaluation and Results

The authors conduct extensive experiments on the ILSVRC-2012 dataset, evaluating models with 1% and 10% of labels. Highlights of their results include:

10% Labels Setup: S $^4$ L-Rotation achieved a top-5 accuracy of 83.82%, outperforming the best baseline semi-supervised methods such as VAT + Entropy Minimization by a significant margin.
1% Labels Setup: The S $^4$ L-Rotation method similarly outperformed established baselines, emphasizing its robustness with minimal labeled data.

Further, the incorporation of regularizations from semi-supervised literature into S $^4$ L methods was found to be complementary, yielding better results, such as in the Mix Of All Models (MOAM) approach. The MOAM model achieved a top-5 accuracy of 91.23% with 10% labels, demonstrating exceptional performance by joint optimization of multiple losses.

In addition, transfer learning results on the Places205 dataset suggested that models trained with the S $^4$ L framework learned more generalizable features, with faster convergence and higher final accuracy compared to pure self-supervision methods.

Implications and Future Directions

The S $^4$ L framework presents considerable practical and theoretical implications. Practically, it suggests a scalable approach to leverage both labeled and unlabeled data, making it adaptable to scenarios where annotated data is scarce or expensive to obtain. Theoretically, it bridges the gap between self-supervised and semi-supervised learning, offering a framework that can be extended to other self-supervised tasks.

Future developments could explore:

Extending the S $^4$ L framework to other domains, such as dense image segmentation or video understanding.
Investigating the efficacy of other self-supervised tasks within the S $^4$ L framework.
Further refinement of joint training techniques to exploit synergy between different loss functions more effectively.

In conclusion, this paper makes a significant contribution by integrating self-supervised learning into the semi-supervised learning paradigm, demonstrating substantial improvements in image classification performance with minimal labeled data. This approach holds promise for broad applicability across various machine learning tasks where data annotation is a limiting factor.

Markdown