Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Normalized Loss Functions for Deep Learning with Noisy Labels (2006.13554v1)

Published 24 Jun 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Robust loss functions are essential for training accurate deep neural networks (DNNs) in the presence of noisy (incorrect) labels. It has been shown that the commonly used Cross Entropy (CE) loss is not robust to noisy labels. Whilst new loss functions have been designed, they are only partially robust. In this paper, we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels. However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs. By investigating several robust loss functions, we find that they suffer from a problem of underfitting. To address this, we propose a framework to build robust loss functions called Active Passive Loss (APL). APL combines two robust loss functions that mutually boost each other. Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels.

Citations (395)

Summary

  • The paper’s main contribution is demonstrating that normalizing loss functions can achieve complete robustness against noisy labels.
  • It proposes the Active Passive Loss (APL) framework that balances active and passive learning to mitigate underfitting, improving accuracy by at least 9% on CIFAR-100 under high noise.
  • The methodology is supported by comprehensive theoretical proofs and experiments across diverse datasets, ensuring robust performance under both symmetric and asymmetric noise conditions.

Normalized Loss Functions for Deep Learning with Noisy Labels: A Comprehensive Review

Introduction

The paper addresses the challenge of effectively training deep neural networks (DNNs) in the presence of noisy labels, which is both a practical necessity and a significant research focus due to its implications for real-world applications where perfect data is seldom available. The authors introduce a novel framework called Active Passive Loss (APL), which aims to enhance the robustness of loss functions used in DNN training. The premise of the work is grounded in the assertion that while numerous loss functions have been partially robust to noisy labels, none have achieved complete robustness without suffering from underfitting.

Theoretical Contributions

The cornerstone of the paper's theoretical contribution is the demonstration that any loss function can be rendered robust to noisy labels through a simple normalization process. This principle is not only substantiated through rigorous theoretical proofs but also through empirical validation across multiple datasets and noise settings. This finding challenges the current understanding and design of robust loss functions, marking a significant theoretical advancement.

In detail, the paper meticulously conceptualizes the robustness of normalized losses under both symmetric and asymmetric noise conditions. The normalization is shown to maintain robustness by modifying the denominator of existing loss functions to include a comprehensive sum over all class predictions, hence achieving a constant sum condition that is pivotal for noise robustness.

Practical Implementation: Active Passive Loss (APL)

The APL framework is proposed to counteract the underfitting issue identified in purely robust loss functions. The paper categorizes existing loss functions as either "Active," which optimizes solely for the labeled class, or "Passive," which also minimizes other class probabilities. The APL combines these two approaches to create loss functions that simultaneously leverage the advantages of both active and passive learning. The insight here is crucial; by actively engaging both maximizing correct and minimizing incorrect class predictions, APL addresses underfitting while retaining theoretical robustness.

Experimental Evaluation

The experimental analysis includes testing on benchmark datasets such as MNIST, CIFAR-10, CIFAR-100, and real-world datasets like WebVision, simulating both synthetic and real noisy label conditions. Empirical results show that APL-based loss functions outperform state-of-the-art techniques, particularly under high noise rates. For instance, on CIFAR-100 with 0.8 symmetric noise, models trained with APL loss functions significantly exceeded the performance of baseline approaches by improving accuracy by at least 9%.

Discussion and Implications

The implications of this research are broad, impacting both theoretical and practical fronts in AI and machine learning. Theoretically, it shifts the paradigm of loss function design towards a more inclusive understanding of robustness. Practically, it provides a blueprint for developing resilient models that can maintain high performance even when data quality is compromised.

Additionally, the paper opens new avenues for future work, particularly in exploring further combinations and characterizations of active and passive losses in other domains where label noise is prevalent. As AI systems proliferate into various sectors, the ability to handle noisy data with robust loss functions such as those presented in this paper will become increasingly crucial.

Conclusion

This paper offers profound insights into the training of DNNs under noisy label conditions, addressing both theoretical and practical challenges through its proposed Active Passive Loss framework. The normalization approach to achieving robustness and the characterization of existing loss functions into active and passive categories presents a novel strategy that is both intuitive and empirically validated. As deep learning applications continue to expand, these contributions will serve as foundational tools in enhancing model reliability and performance in the face of data imperfections.