Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating Memorization of Noisy Labels by Clipping the Model Prediction (2212.04055v3)

Published 8 Dec 2022 in cs.LG and cs.AI

Abstract: In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks. Cross Entropy (CE) loss has been shown to be not robust to noisy labels due to its unboundedness. To alleviate this issue, existing works typically design specialized robust losses with the symmetric condition, which usually lead to the underfitting issue. In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses. Specifically, we propose logit clipping (LogitClip), which clamps the norm of the logit vector to ensure that it is upper bounded by a constant. In this manner, CE loss equipped with our LogitClip method is effectively bounded, mitigating the overfitting to examples with noisy labels. Moreover, we present theoretical analyses to certify the noise-tolerant ability of LogitClip. Extensive experiments show that LogitClip not only significantly improves the noise robustness of CE loss, but also broadly enhances the generalization performance of popular robust losses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (84)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on Computer and Communications Security, pp.  308–318, 2016.
  2. Unsupervised label noise modeling and loss correction. In International Conference on Machine Learning, pp. 312–321. PMLR, 2019.
  3. A closer look at memorization in deep networks. In Proceedings of the 34th International Conference on Machine Learning, pp.  233–242, 2017.
  4. Understanding and improving early stopping for learning with noisy labels. In Advances in Neural Information Processing Systems, 2021.
  5. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994.
  6. Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM, 50(4):506–519, 2003.
  7. Imprecise label learning: A unified framework for learning with various imprecise label configurations. arXiv preprint arXiv:2305.12715, 2023.
  8. Beyond class-conditional assumption: A primary attempt to combat instance-dependent label noise. arXiv preprint arXiv:2012.05458, 2020.
  9. Learning with instance-dependent label noise: A sample sieve approach. International Conference on Learning Representations, 2021.
  10. Mitigating memorization of noisy labels via regularization between representations. In International Conference on Learning Representations (ICLR), 2023.
  11. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255. IEEE, 2009.
  12. Improve noise tolerance of robust loss via noise-awareness. arXiv preprint arXiv:2301.07306, 2023.
  13. Wasserstein adversarial regularization for learning with label noise. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  14. Can cross entropy loss be robust to label noise? In International Joint Conference on Artificial Intelligence, pp.  2206–2212, 2020.
  15. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021.
  16. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, pp.  1919–1925, 2017.
  17. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in Neural Information Processing Systems, pp. 8527–8537, 2018.
  18. Beyond convexity: Stochastic quasi-convex optimization. Advances in Neural Information Processing Systems, 28, 2015.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016.
  20. Using trusted data to train deep networks on labels corrupted by severe noise. arXiv preprint arXiv:1802.05300, 2018.
  21. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  22. Simple and effective regularization methods for training on noisily labeled data with generalization guarantee. arXiv preprint arXiv:1905.11368, 2019.
  23. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  4700–4708, 2017.
  24. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <<< 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
  25. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning, pp. 2304–2313. PMLR, 2018.
  26. Learning multiple layers of features from tiny images. 2009.
  27. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
  28. Levy, K. Y. The power of normalization: Faster evasion of saddle points. arXiv preprint arXiv:1611.04831, 2016.
  29. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394, 2020.
  30. Selective-supervised contrastive learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  316–325, 2022a.
  31. Estimating noise transition matrix with label correlations for noisy multi-label learning. In NeurIPS, 2022b.
  32. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862, 2017.
  33. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp.  2980–2988, 2017.
  34. Do humans and machines have the same eyes? human-machine perceptual differences on image classification. arXiv preprint arXiv:2304.08733, 2023.
  35. Early-learning regularization prevents memorization of noisy labels. Advances in Neural Information Processing Systems, 33, 2020.
  36. Robust training under label noise by over-parameterization. In Proceedings of the 39th International Conference on Machine Learning, pp.  14153–14172. PMLR, 2022a.
  37. Robust training under label noise by over-parameterization. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  14153–14172. PMLR, 2022b.
  38. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3):447–461, 2015.
  39. Peer loss functions: Learning from noisy labels without knowing noise rates. In International Conference on Machine Learning, pp. 6226–6236. PMLR, 2020.
  40. Does label smoothing mitigate label noise? In Proceedings of International Conference on Machine Learning, pp.  6448–6458. PMLR, 2020.
  41. Normalized loss functions for deep learning with noisy labels. In International Conference on Machine Learning, pp. 6543–6553. PMLR, 2020.
  42. Decoupling “when to update" from “how to update". Advances in Neural Information Processing Systems, 30, 2017.
  43. Can gradient clipping mitigate label noise? In International Conference on Learning Representations (ICLR), 2020.
  44. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE transactions on Pattern Analysis and Machine Intelligence, 41(8):1979–1993, 2018.
  45. Revisiting normalized gradient descent: Fast evasion of saddle points. IEEE Transactions on Automatic Control, 64(11):4818–4824, 2019.
  46. Self: Learning to filter noisy labels with self-ensembling. In International Conference on Learning Representations, 2020.
  47. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  1944–1952, 2017.
  48. Adaclip: Adaptive clipping for private sgd. arXiv preprint arXiv:1908.07643, 2019.
  49. Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596, 2014.
  50. Learning to reweight examples for robust deep learning. In International Conference on Machine Learning, pp. 4334–4343. PMLR, 2018.
  51. Meta-weight-net: Learning an explicit mapping for sample weighting. In Advances in Neural Information Processing Systems, pp. 1919–1930, 2019.
  52. Learning an explicit hyperparameter prediction policy conditioned on tasks. arXiv preprint arXiv:2107.02378, 2021.
  53. Cmw-net: Learning a class-aware sample weighting mapping for robust deep learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  54. Sohrab, H. H. Basic real analysis, volume 231. Springer, 2003.
  55. React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems, 34:144–157, 2021.
  56. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.  2818–2826, 2016.
  57. Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  5552–5560, 2018.
  58. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  322–330, 2019.
  59. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13726–13735, 2020a.
  60. Metainfonet: Learning task-guided information for sample reweighting. arXiv preprint arXiv:2012.05273, 2020b.
  61. Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning (ICML). PMLR, 2022a.
  62. Deep learning from multiple noisy annotators as a union. IEEE Transactions on Neural Networks and Learning Systems, pp.  1–11, 2022b.
  63. When optimizing $f$-divergence is robust with label noise. In International Conference on Learning Representations, 2021.
  64. To smooth or not? when label smoothing meets noisy labels. In International Conference on Machine Learning, pp. 23589–23614. PMLR, 2022c.
  65. Learning with noisy labels revisited: A study using real-world human annotations. In International Conference on Learning Representations, 2022d.
  66. To aggregate or not? learning with separate noisy labels. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023.
  67. Class2simi: A noise reduction perspective on learning with noisy labels. In ICML, pp.  11285–11295, 2021a.
  68. Learning to purify noisy labels via meta soft label corrector. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  10388–10396, 2021b.
  69. Are anchor points really indispensable in label-noise learning? In NeurIPS, 2019.
  70. Robust early-learning: Hindering the memorization of noisy labels. In International Conference on Learning Representations, 2020a.
  71. Part-dependent label noise: Towards instance-dependent label noise. Advances in Neural Information Processing Systems, 33:7597–7610, 2020b.
  72. Sample selection with uncertainty of losses for learning with noisy labels. In International Conference on Learning Representations, 2022.
  73. Learning from multiple annotators with varying expertise. Machine Learning, 95(3):291–327, 2014.
  74. How does disagreement help generalization against label corruption? In International Conference on Machine Learning, pp. 7164–7173. PMLR, 2019.
  75. Wide residual networks. In BMVC, 2016.
  76. Understanding deep learning requires rethinking generalization. In Proceedings of International Conference on Learning Representations, 2016.
  77. Why gradient clipping accelerates training: A theoretical justification for adaptivity. In International Conference on Learning Representations, 2020.
  78. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in Neural Information Processing Systems, 31, 2018.
  79. Error-bounded correction of noisy labels. In International Conference on Machine Learning, pp. 11447–11457. PMLR, 2020.
  80. Asymmetric loss functions for learning with noisy labels. In International Conference on Machine Learning, pp. 12846–12856. PMLR, 2021.
  81. A second-order approach to learning with instance-dependent label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10113–10123, 2021a.
  82. Clusterability as an alternative to anchor points when learning with noisy labels. In International Conference on Machine Learning (ICML), 2021b.
  83. Detecting corrupted labels without training a model to predict. In International Conference on Machine Learning (ICML), 2022a.
  84. Beyond images: Label noise transition matrix estimation for tasks with lower-quality features. In International Conference on Machine Learning (ICML). PMLR, 17–23 Jul 2022b.
Citations (24)

Summary

We haven't generated a summary for this paper yet.