Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning (1606.04586v1)

Published 14 Jun 2016 in cs.CV

Abstract: Effective convolutional neural networks are trained on large sets of labeled data. However, creating large labeled datasets is a very costly and time-consuming task. Semi-supervised learning uses unlabeled data to train a model with higher accuracy when there is a limited set of labeled data available. In this paper, we consider the problem of semi-supervised learning with convolutional neural networks. Techniques such as randomized data augmentation, dropout and random max-pooling provide better generalization and stability for classifiers that are trained using gradient descent. Multiple passes of an individual sample through the network might lead to different predictions due to the non-deterministic behavior of these techniques. We propose an unsupervised loss function that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network. We evaluate the proposed method on several benchmark datasets.

Authors (3)

Mehdi Sajjadi (6 papers)
Mehran Javanmardi (5 papers)
Tolga Tasdizen (36 papers)

Citations (1,041)

View on Semantic Scholar

Summary

The paper presents an unsupervised loss function that minimizes prediction differences across randomized data augmentations in ConvNets.
Experimental results on benchmarks like MNIST, CIFAR10, and ImageNet demonstrate state-of-the-art performance improvements with scarce labeled data.
Combining mutual-exclusivity with stochastic regularization improves model stability and generalizability across various architectures.

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning

The paper "Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning" by Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen introduces a novel approach to semi-supervised learning in the context of Convolutional Neural Networks (ConvNets). The authors propose an unsupervised loss function that leverages the stochastic nature of training techniques like randomized data augmentation, dropout, and random max-pooling to enhance the stability and generalization of ConvNets, particularly when labeled training data is sparse.

Introduction and Motivation

ConvNets have achieved state-of-the-art performance on various computer vision tasks, but their success heavily relies on large, labeled datasets. Creating these datasets is resource-intensive, highlighting the need for effective semi-supervised learning methods that can utilize vast amounts of unlabeled data. The paper aims to address this by proposing a novel unsupervised loss function that minimizes the differences between the predictions of multiple passes of a training sample, leveraging the inherent randomness in standard training techniques.

Methodology

The core contribution of the paper is an unsupervised loss function that reduces the variability in ConvNet predictions due to stochastic transformations and perturbations. Formally, the proposed loss function minimizes the mean squared differences between predictions from multiple transformations of the same training sample. Given a set of $N$ training samples and $C$ classes, the loss function for the $i$ -th sample after passing it through the network $n$ times is given by:

$l_{\mathcal{U}^\text{TS}} = \sum_{j=1}^{n-1} \sum_{k=j+1}^n \| \mathbf{f}^j(T^j(\mathbf{x}_i)) - \mathbf{f}^k(T^k(\mathbf{x}_i)) \|_2^2$

where $\mathbf{f}^j(T^j(\mathbf{x}_i))$ represents the prediction vector for the $i$ -th sample during its $j$ -th pass, and $T^j(\mathbf{x}_i)$ denotes a random transformation of the sample.

The authors also incorporate a mutual-exclusivity loss function from prior work, which ensures that the classifier's predictions are mutually exclusive, enhancing the consistency of predictions. The combined loss function is expressed as:

$l_\mathcal{U} = \lambda_1 l_\mathcal{U}^\text{ME} + \lambda_2 l_\mathcal{U}^\text{TS}$

Experimental Results

The proposed method was validated on several benchmark datasets, including MNIST, CIFAR10, CIFAR100, SVHN, NORB, and the ILSVRC 2012 challenge. Key findings include:

MNIST: The semi-supervised approach achieved a significant reduction in error rates, particularly when only a small fraction of labeled data was used.
SVHN and NORB: Experiments confirmed the efficacy of the proposed unsupervised loss function in improving accuracy using both cuda-convnet and sparse convolutional networks, with the latter framework showing notable improvements.
CIFAR10: The authors achieved a state-of-the-art error rate of 3.00%, surpassing the prior benchmark.
CIFAR100: An error rate of 21.43% was achieved, representing the state-of-the-art for this dataset.
ImageNet: The proposed method significantly improved the top-5 error rates on the validation set, underscoring the generalizability of the approach to large-scale datasets.

Discussion

The results demonstrate that the proposed unsupervised loss function can effectively regularize ConvNets, leading to improved generalization even with limited labeled data. The efficacy of the method is consistent across different network architectures and implementations. The significant improvements observed in datasets like MNIST and NORB with minimal labeled data highlight the potential for this approach in practical scenarios where labeled data is scarce.

Conclusion and Future Work

This paper introduces a robust method for semi-supervised learning in ConvNets, providing a systematic way to utilize unlabeled data to enhance model performance. The unsupervised loss function, which minimizes prediction variability caused by stochastic training techniques, shows promise in various applications and datasets. Future work could explore optimization methods and extensions to other neural network architectures, further broadening the impact and usability of the proposed approach in the field of machine learning.

Overall, this paper underscores the potential of leveraging stochastic transformations and perturbations for semi-supervised learning, offering a scalable solution to the challenges posed by limited labeled data in deep learning.

PDF Markdown

Related Papers

YouTube

Show All Videos