Papers
Topics
Authors
Recent
2000 character limit reached

Understanding and Mitigating the Bias in Sample Selection for Learning with Noisy Labels (2401.13360v3)

Published 24 Jan 2024 in cs.LG

Abstract: Learning with noisy labels aims to ensure model generalization given a label-corrupted training set. The sample selection strategy achieves promising performance by selecting a label-reliable subset for model training. In this paper, we empirically reveal that existing sample selection methods suffer from both data and training bias that are represented as imbalanced selected sets and accumulation errors in practice, respectively. However, only the training bias was handled in previous studies. To address this limitation, we propose a noIse-Tolerant Expert Model (ITEM) for debiased learning in sample selection. Specifically, to mitigate the training bias, we design a robust network architecture that integrates with multiple experts. Compared with the prevailing double-branch network, our network exhibits better performance of selection and prediction by ensembling these experts while training with fewer parameters. Meanwhile, to mitigate the data bias, we propose a mixed sampling strategy based on two weight-based data samplers. By training on the mixture of two class-discriminative mini-batches, the model mitigates the effect of the imbalanced training set while avoiding sparse representations that are easily caused by sampling strategies. Extensive experiments and analyses demonstrate the effectiveness of ITEM. Our code is available at this url \href{https://github.com/1998v7/ITEM}{ITEM}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. H. Shen, Z.-Q. Zhao, Y. Zhang, and Z. Zhang, “Mutual information-driven triple interaction network for efficient image dehazing,” in ACM MM, pp. 7–16, 2023.
  2. H. Shen, Z.-Q. Zhao, and W. Zhang, “Adaptive dynamic filtering network for image denoising,” in AAAI, vol. 37, pp. 2227–2235, 2023.
  3. A. Blum, A. Kalai, and H. Wasserman, “Noise-tolerant learning, the parity problem, and the statistical query model,” Journal of the ACM (JACM), vol. 50, no. 4, pp. 506–519, 2003.
  4. Y. Yan, R. Rosales, G. Fung, R. Subramanian, and J. Dy, “Learning from multiple annotators with varying expertise,” Machine learning, vol. 95, pp. 291–327, 2014.
  5. Q. Wei, L. Feng, H. Sun, R. Wang, C. Guo, and Y. Yin, “Fine-grained classification with noisy labels,” in CVPR, pp. 11651–11660, 2023.
  6. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” in ICLR, 2017.
  7. H. Sun, C. Guo, Q. Wei, Z. Han, and Y. Yin, “Learning to rectify for robust learning with noisy labels,” Pattern Recognition, vol. 124, p. 108467, 2022.
  8. H. Wei, L. Feng, X. Chen, and B. An, “Combating noisy labels by agreement: A joint training method with co-regularization,” in CVPR, pp. 13726–13735, 2020.
  9. X. Xia, T. Liu, B. Han, M. Gong, J. Yu, G. Niu, and M. Sugiyama, “Sample selection with uncertainty of losses for learning with noisy labels,” in ICLR, 2022.
  10. B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in NeurIPS, vol. 31, 2018.
  11. Q. Wei, H. Sun, X. Lu, and Y. Yin, “Self-filtering: A noise-aware sample selection for label noise with confidence penalization,” in ECCV, pp. 516–532, Springer, 2022.
  12. J. Li, R. Socher, and S. C. Hoi, “Dividemix: Learning with noisy labels as semi-supervised learning,” in ICLR, 2020.
  13. Y. Xu, P. Cao, Y. Kong, and Y. Wang, “L_dmi: A novel information-theoretic loss function for training deep nets robust to label noise,” in NeurIPS, vol. 32, 2019.
  14. X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, and M. Sugiyama, “How does disagreement help generalization against label corruption?,” in ICML, pp. 7164–7173, PMLR, 2019.
  15. S. Liu, J. Niles-Weed, N. Razavian, and C. Fernandez-Granda, “Early-learning regularization prevents memorization of noisy labels,” in NeurIPS, vol. 33, pp. 20331–20342, 2020.
  16. Y. Bai and T. Liu, “Me-momentum: Extracting hard confident examples from noisily labeled data,” in ICCV, pp. 9312–9321, 2021.
  17. D. Arpit, S. Jastrzkebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio, et al., “A closer look at memorization in deep networks,” in ICML, pp. 233–242, PMLR, 2017.
  18. T. Zhou, S. Wang, and J. Bilmes, “Robust curriculum learning: From clean label detection to noisy label self-correction,” in ICLR, 2020.
  19. S. Yuan, L. Feng, and T. Liu, “Late stopping: Avoiding confidently learning from mislabeled examples,” in ICCV, pp. 16079–16088, 2023.
  20. X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble learning,” Frontiers of Computer Science, vol. 14, pp. 241–258, 2020.
  21. D. Barber and C. Bishop, “Ensemble learning for multi-layer networks,” NeurIPS, vol. 10, 1997.
  22. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.
  23. X. Xia, T. Liu, B. Han, C. Gong, N. Wang, Z. Ge, and Y. Chang, “Robust early-learning: Hindering the memorization of noisy labels,” in ICLR, 2020.
  24. C. Tan, J. Xia, L. Wu, and S. Z. Li, “Co-learning: Learning from noisy labels with self-supervision,” in ACM MM, pp. 1405–1413, 2021.
  25. M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” Neural networks, vol. 106, pp. 249–259, 2018.
  26. J. Byrd and Z. Lipton, “What is the effect of importance weighting in deep learning?,” in ICML, pp. 872–881, PMLR, 2019.
  27. H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
  28. N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study,” Intelligent data analysis, vol. 6, no. 5, pp. 429–449, 2002.
  29. R. Akbani, S. Kwek, and N. Japkowicz, “Applying support vector machines to imbalanced datasets,” in ECML, pp. 39–50, Springer, 2004.
  30. X.-Y. Liu and Z.-H. Zhou, “The influence of class imbalance on cost-sensitive learning: An empirical study,” in ICDM, pp. 970–974, IEEE, 2006.
  31. D. Margineantu, “When does imbalanced data require more than cost-sensitive learning,” in AAAI Workshop, pp. 47–50, 2000.
  32. Z.-H. Zhou and X.-Y. Liu, “Training cost-sensitive neural networks with methods addressing the class imbalance problem,” IEEE TKDE, vol. 18, no. 1, pp. 63–77, 2005.
  33. K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label-distribution-aware margin loss,” in NeurIPS, vol. 32, 2019.
  34. Y. Yang and Z. Xu, “Rethinking the value of labels for improving class-imbalanced learning,” in NeurIPS, vol. 33, pp. 19290–19301, 2020.
  35. Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu, “Large-scale long-tailed recognition in an open world,” in CVPR, pp. 2537–2546, 2019.
  36. Y.-X. Wang, D. Ramanan, and M. Hebert, “Learning to model the tail,” in NeurIPS, vol. 30, 2017.
  37. Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in CVPR, pp. 9268–9277, 2019.
  38. K. Tang, J. Huang, and H. Zhang, “Long-tailed classification by keeping the good and removing the bad momentum causal effect,” in NeurIPS, vol. 33, pp. 1513–1524, 2020.
  39. S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: a review of classification and combining techniques,” Artificial Intelligence Review, vol. 26, pp. 159–190, 2006.
  40. A. C. Lorena, A. C. De Carvalho, and J. M. Gama, “A review on the combination of binary classifiers in multiclass problems,” Artificial Intelligence Review, vol. 30, pp. 19–37, 2008.
  41. L. Rokach, “Ensemble-based classifiers,” Artificial intelligence review, vol. 33, pp. 1–39, 2010.
  42. T. P. Tran, T. T. S. Nguyen, P. Tsai, and X. Kong, “Bspnn: boosted subspace probabilistic neural network for email security,” Artificial Intelligence Review, vol. 35, no. 4, pp. 369–382, 2011.
  43. S. B. Kotsiantis, “An incremental ensemble of classifiers,” Artificial Intelligence Review, vol. 36, pp. 249–266, 2011.
  44. X. Wang, L. Lian, Z. Miao, Z. Liu, and S. X. Yu, “Long-tailed recognition by routing diverse distribution-aware experts,” in ICLR, 2020.
  45. L. Xiang, G. Ding, and J. Han, “Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification,” in ECCV, pp. 247–263, Springer, 2020.
  46. S. Zuo, X. Liu, J. Jiao, Y. J. Kim, H. Hassan, R. Zhang, T. Zhao, and J. Gao, “Taming sparsely activated transformer with stochastic experts,” in ICLR, 2021.
  47. Y. Zhou, T. Lei, H. Liu, N. Du, Y. Huang, V. Zhao, A. M. Dai, Q. V. Le, J. Laudon, et al., “Mixture-of-experts with expert choice routing,” in NeurIPS, vol. 35, pp. 7103–7114, 2022.
  48. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NeurIPS, vol. 25, 2012.
  49. V. Agarwal, T. Podchiyska, J. M. Banda, V. Goel, T. I. Leung, E. P. Minty, T. E. Sweeney, E. Gyang, and N. H. Shah, “Learning statistical models of phenotypes using noisy labeled training data,” Journal of the American Medical Informatics Association, vol. 23, no. 6, pp. 1166–1173, 2016.
  50. Y. Li, H. Han, S. Shan, and X. Chen, “Disc: Learning from noisy labels via dynamic instance-specific selection and correction,” in CVPR, pp. 24070–24079, 2023.
  51. S. Masoudnia and R. Ebrahimpour, “Mixture of experts: a literature survey,” Artificial Intelligence Review, vol. 42, pp. 275–293, 2014.
  52. B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in CVPR, pp. 9719–9728, 2020.
  53. R. Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,” Methodology and computing in applied probability, vol. 1, pp. 127–190, 1999.
  54. L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in ICML, pp. 2304–2313, PMLR, 2018.
  55. D. Tanaka, D. Ikami, T. Yamasaki, and K. Aizawa, “Joint optimization framework for learning with noisy labels,” in CVPR, pp. 5552–5560, 2018.
  56. Y. Bai, E. Yang, B. Han, Y. Yang, J. Li, Y. Mao, G. Niu, and T. Liu, “Understanding and improving early stopping for learning with noisy labels,” in NeurIPS, vol. 34, pp. 24392–24403, 2021.
  57. J. Wei, Z. Zhu, H. Cheng, T. Liu, G. Niu, and Y. Liu, “Learning with noisy labels revisited: A study using real-world human annotations,” arXiv preprint arXiv:2110.12088, 2021.
  58. T. Xiao, T. Xia, Y. Yang, C. Huang, and X. Wang, “Learning from massive noisy labeled data for image classification,” in CVPR, pp. 2691–2699, 2015.
  59. H. Cheng, Z. Zhu, X. Li, Y. Gong, X. Sun, and Y. Liu, “Learning with instance-dependent label noise: A sample sieve approach,” ICLR, 2021.
  60. Y. Liu and H. Guo, “Peer loss functions: Learning from noisy labels without knowing noise rates,” in ICML, pp. 6226–6236, PMLR, 2020.
  61. Z. Zhu, T. Liu, and Y. Liu, “A second-order approach to learning with instance-dependent label noise,” in CVPR, pp. 10113–10123, 2021.
  62. M. Zhang, X. Zhao, J. Yao, C. Yuan, and W. Huang, “When noisy labels meet long tail dilemmas: A representation calibration method,” in ICCV, pp. 15890–15900, 2023.
  63. S. Jiang, J. Li, Y. Wang, B. Huang, Z. Zhang, and T. Xu, “Delving into sample loss curve to embrace noisy and imbalanced data,” in AAAI, vol. 36, pp. 7024–7032, 2022.
  64. K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” NeurIPS, vol. 33, pp. 596–608, 2020.
  65. A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei, “Novel dataset for fine-grained image categorization,” in Workshop on CVPR, 2011.
  66. P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona, “Caltech-ucsd birds 200,” tech. rep., 2010.
  67. E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness, “Unsupervised label noise modeling and loss correction,” in ICML, pp. 312–321, PMLR, 2019.
  68. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in ICML, pp. 1597–1607, PMLR, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.