Does label smoothing mitigate label noise? (2003.02819v1)

Published 5 Mar 2020 in cs.LG and stat.ML

Abstract: Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In this paper, we study whether label smoothing is also effective as a means of coping with label noise. While label smoothing apparently amplifies this problem --- being equivalent to injecting symmetric noise to the labels --- we show how it relates to a general family of loss-correction techniques from the label noise literature. Building on this connection, we show that label smoothing is competitive with loss-correction under label noise. Further, we show that when distilling models from noisy data, label smoothing of the teacher is beneficial; this is in contrast to recent findings for noise-free problems, and sheds further light on settings where label smoothing is beneficial.

View on arXiv

Authors (4)

Michal Lukasik (23 papers)
Srinadh Bhojanapalli (44 papers)
Aditya Krishna Menon (56 papers)
Sanjiv Kumar (123 papers)

Citations (324)

View on Semantic Scholar

Summary

Review of "Does Label Smoothing Mitigate Label Noise?"

In the paper "Does Label Smoothing Mitigate Label Noise?" the authors, Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, and Sanjiv Kumar, provide a comprehensive analysis and exploration of label smoothing within the context of handling label noise in deep learning models. Label smoothing is a regularization technique commonly applied during model training to distribute some of the correct label probability mass to other classes, rather than employing hard one-hot labels. This paper explores the efficacy of label smoothing as a potential solution for scenarios where label noise is prevalent.

Context and Methodology

Label noise is a well-recognized challenge in training robust deep learning models, as it can degrade model performance by causing overfitting to incorrect labels. Traditional methods to address this include loss correction techniques, which assume knowledge of the noise transition matrix. The primary contribution of this paper is the investigation of label smoothing as an alternative or complement to these conventional techniques.

The authors elucidate a theoretical connection between label smoothing and this broader class of loss-correction strategies by framing both under a generalized operation they term "label smearing." In essence, they show how label smoothing functions as a particular instance of this smearing, comparable to backward correction under symmetric noise conditions.

Empirical Findings

Through empirical evaluation on benchmark datasets such as CIFAR-10, CIFAR-100, and ImageNet, the authors substantiate that label smoothing can indeed improve test accuracy amidst label noise. Notably, smoothing performs comparably well alongside traditional correction techniques and, in some circumstances, even surpasses them. An interesting observation is the effectiveness of employing a smoothing parameter ( $\alpha$ ) higher than the actual noise rate, highlighting the importance of tuning this parameter beyond theoretical guidelines.

Furthermore, an analysis of model confidence reveals that label smoothing notably decreases model overconfidence in noisy labels, thereby improving generalization on clean data. This is contrasted with forward and backward correction methods, which tend to increase confidence.

Theoretical Insights

Notably, the paper offers a view of label smoothing through a regularization lens. The authors draw parallels between label smoothing and $\ell_2$ regularization, particularly in linear models, where such regularization is known to counteract the effects of label noise by expanding the decision boundary margins. This perspective explicates why label smoothing, despite its bias from fitting the ground truth distribution, can nonetheless perform effectively as a noise mitigation strategy.

Distillation and Label Noise

An extension of the paper examines the utility of label smoothing in knowledge distillation processes when faced with noisy data. Contrary to previous findings in noise-free settings, label smoothing applied to teacher models resulted in improvements in student performance during distillation. This suggests that smoothing facilitates the transfer of useful, noise-robust features from teacher to student, presenting a beneficial strategy for model compression tasks under noisy conditions.

Conclusion and Future Impacts

The paper significantly contributes to the dialogue on robust training practices in noisy environments. By connecting two previously disparate concepts—label smoothing and loss correction—the authors pave the way for integrating these techniques to leverage their respective benefits. This research not only underscores the practical implications of label smoothing in improving noise robustness but also sets the stage for further theoretical exploration into the regularization properties of label smoothing across model architectures and diverse noise models.

Anticipated future advancements could involve developing adaptive smoothing techniques that automatically adjust $\alpha$ in response to noise characteristics, or further investigating the synergies between label smoothing and complex architectures, paving the way for more resilient AI systems.

PDF Markdown

Related Papers

Find Related Papers