Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels (2301.00545v4)

Published 2 Jan 2023 in cs.LG and cs.CV

Abstract: A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models are available at https://github.com/Yikai-Wang/Knockoffs-SPR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” in ICLR, 2017.
  2. T. Xiao, T. Xia, Y. Yang, C. Huang, and X. Wang, “Learning from massive noisy labeled data for image classification,” in CVPR, 2015.
  3. J. Goldberger and E. Ben-Reuven, “Training deep neural-networks using a noise adaptation layer,” in ICLR, 2017.
  4. X. Chen and A. Gupta, “Webly supervised learning of convolutional networks,” in ICCV, 2015.
  5. B. Han, J. Yao, G. Niu, M. Zhou, I. W. Tsang, Y. Zhang, and M. Sugiyama, “Masking: a new perspective of noisy supervision,” in NeurIPS, 2018.
  6. A. Ghosh, H. Kumar, and P. Sastry, “Robust loss functions under label noise for deep neural networks,” in AAAI, 2017.
  7. Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” in NeurIPS, 2018.
  8. Y. Wang, X. Ma, Z. Chen, Y. Luo, J. Yi, and J. Bailey, “Symmetric cross entropy for robust learning with noisy labels,” in ICCV, 2019.
  9. Y. Lyu and I. W. Tsang, “Curriculum loss: Robust learning and generalization against label corruption,” in ICLR, 2020.
  10. H. Song, M. Kim, and J.-G. Lee, “Selfie: Refurbishing unclean samples for robust deep learning,” in ICML, 2019.
  11. B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. W. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in NeurIPS, 2018.
  12. L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in ICML, 2018.
  13. P. Chen, B. B. Liao, G. Chen, and S. Zhang, “Understanding and utilizing deep neural networks trained with noisy labels,” in ICML, 2019.
  14. Y. Shen and S. Sanghavi, “Learning with bad training data via iterative trimmed loss minimization,” in ICML, 2019.
  15. X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, and M. Sugiyama, “How does disagreement help generalization against label corruption?” in ICML, 2019.
  16. D. T. Nguyen, C. K. Mummadi, T. P. N. Ngo, T. H. P. Nguyen, L. Beggel, and T. Brox, “Self: Learning to filter noisy labels with self-ensembling,” in ICLR, 2020.
  17. T. Zhou, S. Wang, and J. Bilmes, “Robust curriculum learning: From clean label detection to noisy label self-correction,” in ICLR, 2021.
  18. P. Wu, S. Zheng, M. Goswami, D. N. Metaxas, and C. Chen, “A topological filter for learning with label noise,” NeurIPS, 2020.
  19. W. Sanford, “Applied linear regression,” John Wiley & Sons, 1985.
  20. Y. She and A. B. Owen, “Outlier detection using nonconvex penalized regression,” Journal of the American Statistical Association, 2011.
  21. J. Neyman and E. L. Scott, “Consistent estimates based on partially consistent observations,” Econometrica: Journal of the Econometric Society, 1948.
  22. J. Kiefer and J. Wolfowitz, “Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters,” The Annals of Mathematical Statistics, 1956.
  23. D. Basu, “On the elimination of nuisance parameters,” in Selected Works of Debabrata Basu, 2011.
  24. M. Moreira, “A maximum likelihood method for the incidental parameter problem,” National Bureau of Economic Research, Tech. Rep., 2008.
  25. J. Fan, R. Tang, and X. Shi, “Partial consistency with sparse incidental parameters,” Statistica Sinica, 2018.
  26. Y. Fu, T. M. Hospedales, T. Xiang, J. Xiong, S. Gong, Y. Wang, and Y. Yao, “Robust subjective visual property prediction from crowdsourced pairwise labels,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
  27. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
  28. Y. Wang, C. Xu, C. Liu, L. Zhang, and Y. Fu, “Instance credibility inference for few-shot learning,” in CVPR, 2020.
  29. Y. Wang, L. Zhang, Y. Yao, and Y. Fu, “How to trust unlabeled data? instance credibility inference for few-shot learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  30. E. Simpson and I. Gurevych, “Scalable bayesian preference learning for crowds,” Machine Learning, 2020.
  31. Y. Wang, X. Sun, and Y. Fu, “Scalable penalized regression for noise detection in learning with noisy labels,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  32. M. J. Wainwright, “Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT -constrained quadratic programming (lasso),” IEEE transactions on information theory, 2009.
  33. P. Zhao and B. Yu, “On model selection consistency of lasso,” Journal of Machine learning research, 2006.
  34. R. F. Barber and E. J. Candès, “Controlling the false discovery rate via knockoffs,” The Annals of Statistics, vol. 43, no. 5, pp. 2055–2085, 2015.
  35. R. Dai and R. Barber, “The knockoff filter for fdr control in group-sparse and multitask regression,” in International conference on machine learning.   PMLR, 2016, pp. 1851–1859.
  36. R. F. Barber and E. J. Candès, “A knockoff filter for high-dimensional selective inference,” The Annals of Statistics, vol. 47, no. 5, pp. 2504 – 2537, 2019. [Online]. Available: https://doi.org/10.1214/18-AOS1755
  37. Y. Cao, X. Sun, and Y. Yao, “Controlling the false discovery rate in transformational sparsity: Split knockoffs,” Journal of the Royal Statistical Society Series B: Statistical Methodology, p. qkad126, 11 2023.
  38. S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in ICCV, 2019.
  39. N. Simon, J. Friedman, and T. Hastie, “A blockwise descent algorithm for group-penalized multiresponse and multinomial regression,” arXiv preprint arXiv:1311.6529, 2013.
  40. Q. Xu, J. Xiong, X. Cao, Q. Huang, and Y. Yao, “Evaluating visual properties via robust hodgerank,” International Journal of Computer Vision, pp. 1–22, 2021.
  41. Q. Xu, J. Xiong, X. Cao, and Y. Yao, “False discovery rate control and statistical quality assessment of annotators in crowdsourced ranking,” in International conference on machine learning.   PMLR, 2016, pp. 1282–1291.
  42. X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 750–15 758.
  43. X. Zhou, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Asymmetric loss functions for noise-tolerant learning: Theory and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8094–8109, 2023.
  44. R. Tanno, A. Saeedi, S. Sankaranarayanan, D. C. Alexander, and N. Silberman, “Learning from noisy labels by regularized estimation of annotator confusion,” in CVPR, 2019.
  45. A. K. Menon, A. S. Rawat, S. J. Reddi, and S. Kumar, “Can gradient clipping mitigate label noise?” in ICLR, 2020.
  46. X. Xia, T. Liu, B. Han, C. Gong, N. Wang, Z. Ge, and Y. Chang, “Robust early-learning: Hindering the memorization of noisy labels,” in ICLR, 2021.
  47. X. Zhou, X. Liu, C. Wang, D. Zhai, J. Jiang, and X. Ji, “Learning with noisy labels via sparse regularization,” in ICCV, 2021.
  48. S. Thulasidasan, T. Bhattacharya, J. Bilmes, G. Chennupati, and J. Mohd-Yusof, “Combating label noise in deep learning using abstention,” in ICML, 2019.
  49. D. Tanaka, D. Ikami, T. Yamasaki, and K. Aizawa, “Joint optimization framework for learning with noisy labels,” in CVPR, 2018.
  50. J. Li, R. Socher, and S. C. Hoi, “Dividemix: Learning with noisy labels as semi-supervised learning,” in ICLR, 2020.
  51. M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in ICML, 2018.
  52. Y. Wang, W. Liu, X. Ma, J. Bailey, H. Zha, L. Song, and S.-T. Xia, “Iterative learning with open-set noisy labels,” in CVPR, 2018.
  53. K. Lee, S. Yun, K. Lee, H. Lee, B. Li, and J. Shin, “Robust inference via generative classifiers for handling noisy labels,” in ICML, 2019.
  54. H. Dong, Z. Sun, Y. Fu, S. Zhong, Z. Zhang, and Y.-G. Jiang, “Extreme vocabulary learning,” Frontiers of Computer Science, vol. 14, no. 6, pp. 1–12, 2020.
  55. A. Veit, N. Alldrin, G. Chechik, I. Krasin, A. Gupta, and S. Belongie, “Learning from noisy large-scale datasets with minimal supervision,” in CVPR, 2017.
  56. G. Patrini, A. Rozza, A. Krishna Menon, R. Nock, and L. Qu, “Making deep neural networks robust to label noise: A loss correction approach,” in CVPR, 2017.
  57. E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness, “Unsupervised label noise modeling and loss correction,” in ICML, 2019.
  58. A. Vahdat, “Toward robustness against label noise in training deep discriminative neural networks,” NeurIPS, 2017.
  59. Y. Li, J. Yang, Y. Song, L. Cao, J. Luo, and L.-J. Li, “Learning from noisy labels with distillation,” in ICCV, 2017.
  60. K. Yi and J. Wu, “Probabilistic end-to-end noise correction for learning with noisy labels,” in CVPR, 2019.
  61. J. Fan and J. Lv, “A selective overview of variable selection in high dimensional feature space,” Statistica Sinica, 2010.
  62. E. Candes, Y. Fan, L. Janson, and J. Lv, “Panning for gold:‘model-x’knockoffs for high dimensional controlled variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 80, no. 3, pp. 551–577, 2018.
  63. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Master’s thesis, University of Tront, 2009.
  64. W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool, “Webvision database: Visual learning and understanding from web data,” arXiv preprint arXiv:1708.02862, 2017.
  65. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
  66. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in AAAI, 2017.
  67. D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio et al., “A closer look at memorization in deep networks,” in ICML, 2017.
  68. S. E. Reed, H. Lee, D. Anguelov, C. Szegedy, D. Erhan, and A. Rabinovich, “Training deep neural networks on noisy labels with bootstrapping,” in ICLR (Workshop), 2015.
  69. E. Malach and S. Shalev-Shwartz, “Decoupling” when to update” from” how to update”,” in NeurIPS, 2017.
  70. X. Ma, Y. Wang, M. E. Houle, S. Zhou, S. Erfani, S. Xia, S. Wijewickrema, and J. Bailey, “Dimensionality-driven learning with noisy labels,” in ICML, 2018.
  71. W. Zhang, Y. Wang, and Y. Qiao, “Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7373–7382.
  72. J. Li, Y. Wong, Q. Zhao, and M. S. Kankanhalli, “Learning to learn from noisy labeled data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5051–5059.
  73. Joseph L Doob. Stochastic processes. Wiley New York, 195
  74. M. J. Wainwright, “Sharp thresholds for high-dimensional and noisy sparsity recovery using l1 -constrained quadratic programming (lasso),” IEEE transactions on information theory, 2009.
Citations (2)

Summary

We haven't generated a summary for this paper yet.