Symmetric Cross Entropy for Robust Learning with Noisy Labels (1908.06112v1)

Published 16 Aug 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Training accurate deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Though a number of approaches have been proposed for learning with noisy labels, many open issues remain. In this paper, we show that DNN learning with Cross Entropy (CE) exhibits overfitting to noisy labels on some classes ("easy" classes), but more surprisingly, it also suffers from significant under learning on some other classes ("hard" classes). Intuitively, CE requires an extra term to facilitate learning of hard classes, and more importantly, this term should be noise tolerant, so as to avoid overfitting to noisy labels. Inspired by the symmetric KL-divergence, we propose the approach of \textbf{Symmetric cross entropy Learning} (SL), boosting CE symmetrically with a noise robust counterpart Reverse Cross Entropy (RCE). Our proposed SL approach simultaneously addresses both the under learning and overfitting problem of CE in the presence of noisy labels. We provide a theoretical analysis of SL and also empirically show, on a range of benchmark and real-world datasets, that SL outperforms state-of-the-art methods. We also show that SL can be easily incorporated into existing methods in order to further enhance their performance.

Citations (812)

View on Semantic Scholar

Summary

The paper introduces the Symmetric Cross Entropy Learning (SL) approach by combining traditional cross entropy with a robust reverse loss to mitigate noisy labels.
It demonstrates significant accuracy improvements on benchmarks like MNIST, CIFAR-10/100, and Clothing1M by addressing overfitting and under learning issues.
The study provides theoretical evidence for noise tolerance under both symmetric and asymmetric conditions, offering practical insights for robust DNN training.

Symmetric Cross Entropy for Robust Learning with Noisy Labels

The paper "Symmetric Cross Entropy for Robust Learning with Noisy Labels" by Yisen Wang et al., addresses the specific challenge of training deep neural networks (DNNs) when faced with noisy labels. Existing methods predominantly suffer from two main issues: overfitting to noisy labels, especially in "easy" classes, and significant under learning in "hard" classes. This paper introduces the Symmetric Cross Entropy Learning (SL) approach, which aims to mitigate these challenges through a novel combination of Cross Entropy (CE) and Reverse Cross Entropy (RCE).

Core Contribution

The primary contribution of the paper is the Symmetric Cross Entropy Learning (SL) approach, defined by the addition of a noise-robust Reverse Cross Entropy (RCE) term to the traditional CE loss. Intuitively, while CE is effective in ensuring convergence, it is highly sensitive to noisy labels. RCE, inspired by the symmetric KL-divergence, is noise-tolerant and primarily addresses the under learning problem without overfitting to noisy labels. This balanced approach aims to enhance model robustness and improve learning, particularly in the presence of noisy labels.

Theoretical Insights

The paper provides a robust theoretical foundation for SL. It proves that the RCE loss is inherently robust to both symmetric and asymmetric label noise under specific conditions. This robustness is achieved by defining the RCE loss as the cross-entropy computed in the reverse direction. When combined with traditional CE, the SL approach utilizes the strengths of both losses to create a balanced and robust learning framework.

Empirical Validation

Empirical results on various benchmark datasets, including MNIST, CIFAR-10, CIFAR-100, and the large-scale real-world noisy dataset Clothing1M, consistently demonstrate SL's superior performance over existing methods. Key findings from these experiments include:

On MNIST, CIFAR-10, and CIFAR-100, SL achieves significant improvement in test accuracies across varying noise rates (both symmetric and asymmetric).
The introduction of SL in existing noise-robust methods, such as Forward correction and Label Smoothing Regularization (LSR), further enhances their performance, validating SL's broad applicability.
In the Clothing1M dataset, SL achieves the highest accuracy compared to other state-of-the-art methods, reinforcing its practical utility in real-world scenarios.

Implications and Future Developments

From a theoretical perspective, the insights from this paper contribute to a deeper understanding of DNN behavior under label noise. The observation that CE suffers from under learning on hard classes is particularly noteworthy, as it challenges the common view that overfitting to noisy labels is the primary cause of degraded performance. Practically, SL's simplicity and ease of implementation make it an attractive addition to the repertoire of techniques for robust deep learning. Given its effectiveness, future research could explore:

Adaptive tuning of the parameters $\alpha$ and $\beta$ in SL to accommodate different datasets and noise characteristics.
Integration of SL with emerging architectures and training paradigms to further enhance robustness and generalization.
Application of SL in other domains, such as natural language processing and medical imaging, where noisy labels are prevalent.

Conclusion

The paper presents a compelling argument for the adoption of Symmetric Cross Entropy Learning (SL) as a robust method for training deep neural networks with noisy labels. By addressing both overfitting and under learning issues through a balanced loss function, SL not only improves learning accuracy under noisy conditions but also provides a versatile framework that can be incorporated into existing methods to enhance their performance. Theoretical proofs and extensive empirical evidence underscore its potential for widespread application in noisy label scenarios.

SL represents a significant step forward in training DNNs robustly in the presence of noisy labels, offering both theoretical insights and practical benefits. As noisy data is ubiquitous in the real world, the developments presented in this paper are poised to have a substantial impact on the field of deep learning.

PDF Markdown