Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations (2110.12088v2)

Published 22 Oct 2021 in cs.LG and stat.ML

Abstract: Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic noise, though has clean structures which greatly enabled statistical analyses, often fails to model real-world noise patterns. The recent literature has observed several efforts to offer real-world noisy datasets, yet the existing efforts suffer from two caveats: (1) The lack of ground-truth verification makes it hard to theoretically study the property and treatment of real-world label noise; (2) These efforts are often of large scales, which may result in unfair comparisons of robust methods within reasonable and accessible computation power. To better understand real-world label noise, it is crucial to build controllable and moderate-sized real-world noisy datasets with both ground-truth and noisy labels. This work presents two new benchmark datasets CIFAR-10N, CIFAR-100N, equipping the training datasets of CIFAR-10, CIFAR-100 with human-annotated real-world noisy labels we collected from Amazon Mechanical Turk. We quantitatively and qualitatively show that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise). We then initiate an effort to benchmarking a subset of the existing solutions using CIFAR-10N and CIFAR-100N. We further proceed to study the memorization of correct and wrong predictions, which further illustrates the difference between human noise and class-dependent synthetic noise. We show indeed the real-world noise patterns impose new and outstanding challenges as compared to synthetic label noise. These observations require us to rethink the treatment of noisy labels, and we hope the availability of these two datasets would facilitate the development and evaluation of future learning with noisy label solutions. Datasets and leaderboards are available at http://noisylabels.com.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiaheng Wei (30 papers)
  2. Zhaowei Zhu (29 papers)
  3. Hao Cheng (190 papers)
  4. Tongliang Liu (251 papers)
  5. Gang Niu (125 papers)
  6. Yang Liu (2253 papers)
Citations (212)

Summary

Insights into Learning with Noisy Labels: A Study Using Real-World Human Annotations

The exploration of learning algorithms in the presence of noisy labels has primarily been dominated by synthetic noise models due to their clean mathematical structures, facilitating rigorous statistical analyses. However, these synthetic models have proven inadequate for capturing the complexity and variability present in real-world noisy label scenarios. This paper revisits learning with noisy labels by introducing new benchmark datasets, CIFAR-10N and CIFAR-100N, which seek to illuminate the landscape of real-world label noise. These datasets, enriched with human annotation from Amazon Mechanical Turk, are positioned to evaluate and further develop methods that handle noisy labels more effectively.

Key Contributions

  1. Introduction of CIFAR-N Benchmarks: The authors present two benchmark datasets, CIFAR-10N and CIFAR-100N, collectively termed CIFAR-N, incorporated with human-annotated noisy labels. These datasets emerge as essential tools for understanding instance-dependent label noise, diverging from the class-dependent assumptions in synthetic noise models.
  2. Real-World Noise Patterns: The paper finds that real-world noisy labels, gathered from human annotations, typically follow instance-dependent patterns, contrasting the classical class-dependent patterns of synthetic noise. It is noted that noisy labels more frequently flip to labels of similar features, highlighting a nuanced, localized noise dependency.
  3. The Complexity of Human Annotation Noise: A significant observation is the feature-dependent nature of noise transitions. This realization prompts a reevaluation of traditional noise modeling, as real-world annotations exhibit intricate dependency on individual instances, contrary to synthetic model assumptions.
  4. Performance Evaluation of Robust Methods: By benchmarking a range of existing methods against the CIFAR-N datasets, the paper showcases the disparity in performance when these methods are confronted with real-world noise versus synthetic noise. This reveals the crucial need for algorithms that can endure the challenges posed by human annotation noise.

Implications and Future Directions

Practical Implications

  • Enhancing Noise-Resilient Models: The introduction of CIFAR-N provides a definitive platform for developing and training models that must adapt to real-world label noise. This is particularly relevant for applications where obtaining large-scale, perfectly labeled data is impractical.
  • Algorithm Benchmarking: Establishing a standard benchmarking procedure with CIFAR-N promises fair and comprehensive evaluations of different noise handling algorithms without the exaggeration of scale and computational inequality, which plagues current large-scale datasets.

Theoretical Implications

  • Rethinking Noise Models: The findings advocate for a paradigm shift in noise modeling—from uniformly applied synthetic noise to more complex, nuanced models reflecting human annotations. This shift is essential for accurately assessing algorithmic robustness in practical scenarios.
  • Statistical Assumptions: New statistical frameworks may need to be devised to incorporate the feature-dependent noise observed in human annotations. Such frameworks could redefine theoretical approaches to learning in noisy environments.

Speculative Future Directions

  • Future models might leverage more sophisticated methods of distinguishing between noise patterns across different annotation interfaces and contributors, potentially incorporating machine learning techniques for better noise filtering and correction.
  • Continued investigation into multi-label learning contexts, as well as the real-world implications of annotator disagreements and inaccuracies, could yield significant advances in both theory and practice.

In conclusion, the exploration introduced through this paper represents a pivotal step towards bridging the gap between theoretical efficacy and practical applicability in learning with noisy labels. The CIFAR-N datasets aim to motivate future research into more realistic noise models, fostering robust machine learning methods equipped to handle the intricacies of real-world data.