Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach (1609.03683v2)

Published 13 Sep 2016 in stat.ML and cs.LG

Abstract: We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted into another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and thus providing an end-to-end framework. Extensive experiments on MNIST, IMDB, CIFAR-10, CIFAR-100 and a large scale dataset of clothing images employing a diversity of architectures --- stacking dense, convolutional, pooling, dropout, batch normalization, word embedding, LSTM and residual layers --- demonstrate the noise robustness of our proposals. Incidentally, we also prove that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise.

Citations (1,361)

View on Semantic Scholar

Summary

The paper proposes two loss correction techniques—backward and forward corrections—to mitigate the damaging effects of label noise during DNN training.
It introduces a novel method to estimate the noise transition matrix, ensuring unbiased loss estimates for more reliable model performance.
Extensive experiments on datasets like CIFAR-10 and Clothing1M demonstrate significant accuracy improvements compared to standard cross-entropy loss.

Loss Correction Techniques for Robust Deep Neural Networks Under Label Noise

The paper "Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach" by Giorgio Patrini et al. presents novel methodologies to handle label noise during the training of deep neural networks (DNNs). This work is significant due to the prevalent issue of noisy labels in large datasets, where manual labeling is often replaced by less reliable methods such as crowdsourcing or automated querying via search engines, resulting in corrupted labels that can degrade model performance.

Methodology

The authors propose two loss correction techniques, termed "backward correction" and "forward correction." These techniques assume knowledge of the label noise transition matrix, which specifies the probabilities of one class label being mislabeled as another.

Backward correction involves directly correcting the loss function. Given a noisy label dataset, this method introduces a modified loss function, whereby the observed loss reflects the necessary correction based on the noise probabilities. Essentially, this correction is implemented through a transformation involving the inverse of the noise transition matrix, denoted as $T^{-1}$ . The authors rigorously prove that this correction yields an unbiased estimate of the clean loss, thereby maintaining the integrity of the DNN training in the presence of label noise.

Forward correction modifies the model's predictions before applying the loss function. This technique works by projecting the noisy predicted probabilities back to the original label space using the noise transition matrix $T$ . Unlike the backward correction, this approach does not necessitate the matrix inversion, which could be numerically unstable. Though it provides robustness to the noise by adjusting the predictions accordingly, it offers formal guarantees only for proper composite losses.

Noise Estimation

One critical requirement for these corrections is the availability of the noise transition matrix $T$ . The authors acknowledge that this is often not directly available and thus present an innovative method to estimate $T$ . By leveraging the output of a noisy model and assuming the existence of "perfect examples," this estimation can be done effectively even without ground-truth labels. This dual-phase training approach first involves estimating the transition matrix using the noisy model outputs and subsequently training a new model under the derived corrected loss.

Experimental Evaluation

The authors validate their approach with extensive experiments on a variety of datasets, including MNIST, IMDB, CIFAR-10, CIFAR-100, and the large-scale Clothing1M dataset. They test their methods across different architectures, such as fully connected networks, convolutional networks, Long Short-Term Memory (LSTM) networks, and ResNet architectures. The results illustrate that their proposed loss correction techniques significantly improve robustness to label noise, generally outperforming models trained with standard cross-entropy loss. In particular:

On MNIST and IMDB datasets, both backward and forward corrections demonstrated resilience to high levels of asymmetric noise, maintaining substantially higher accuracy compared to uncorrected models.
On CIFAR-10 and CIFAR-100, the forward correction technique consistently delivered better performance, particularly under high noise regimes.
The methods also established a new state-of-the-art on the Clothing1M dataset.

Theoretical Implications

The paper provides important insights into the theoretical foundations of robust learning. By proving that the backward correction provides an unbiased estimate of the true loss and that forward correction maintains optimal minimizers under certain conditions, it sets a strong precedent for loss correction techniques in noisy environments. Moreover, the authors show that for ReLU networks, the Hessian of the loss function remains unaffected by class-dependent label noise, underscoring the robustness of ReLU activations in noisy contexts.

Practical Implications

Practically, the proposed techniques offer a robust and architecture-agnostic means to improve DNN training with noisy labels. By aligning theoretical rigor with practical efficacy, they provide a comprehensive solution to a common problem in machine learning. Future research could refine the noise estimation process, especially in complex multi-class scenarios, and explore the applicability of these methods to instance-dependent noise. These advancements could pave the way for more reliable pre-training using vast, albeit noisily labeled, datasets.

In conclusion, this paper delivers significant contributions to the robustness of DNNs against label noise. The loss correction approaches, bolstered by theoretical validation and empirical success, render it a valuable resource for researchers addressing the challenges posed by noisy data.

PDF Markdown