Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels (2301.00545v4)
Abstract: A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models are available at https://github.com/Yikai-Wang/Knockoffs-SPR.
- C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” in ICLR, 2017.
- T. Xiao, T. Xia, Y. Yang, C. Huang, and X. Wang, “Learning from massive noisy labeled data for image classification,” in CVPR, 2015.
- J. Goldberger and E. Ben-Reuven, “Training deep neural-networks using a noise adaptation layer,” in ICLR, 2017.
- X. Chen and A. Gupta, “Webly supervised learning of convolutional networks,” in ICCV, 2015.
- B. Han, J. Yao, G. Niu, M. Zhou, I. W. Tsang, Y. Zhang, and M. Sugiyama, “Masking: a new perspective of noisy supervision,” in NeurIPS, 2018.
- A. Ghosh, H. Kumar, and P. Sastry, “Robust loss functions under label noise for deep neural networks,” in AAAI, 2017.
- Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” in NeurIPS, 2018.
- Y. Wang, X. Ma, Z. Chen, Y. Luo, J. Yi, and J. Bailey, “Symmetric cross entropy for robust learning with noisy labels,” in ICCV, 2019.
- Y. Lyu and I. W. Tsang, “Curriculum loss: Robust learning and generalization against label corruption,” in ICLR, 2020.
- H. Song, M. Kim, and J.-G. Lee, “Selfie: Refurbishing unclean samples for robust deep learning,” in ICML, 2019.
- B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. W. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in NeurIPS, 2018.
- L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in ICML, 2018.
- P. Chen, B. B. Liao, G. Chen, and S. Zhang, “Understanding and utilizing deep neural networks trained with noisy labels,” in ICML, 2019.
- Y. Shen and S. Sanghavi, “Learning with bad training data via iterative trimmed loss minimization,” in ICML, 2019.
- X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, and M. Sugiyama, “How does disagreement help generalization against label corruption?” in ICML, 2019.
- D. T. Nguyen, C. K. Mummadi, T. P. N. Ngo, T. H. P. Nguyen, L. Beggel, and T. Brox, “Self: Learning to filter noisy labels with self-ensembling,” in ICLR, 2020.
- T. Zhou, S. Wang, and J. Bilmes, “Robust curriculum learning: From clean label detection to noisy label self-correction,” in ICLR, 2021.
- P. Wu, S. Zheng, M. Goswami, D. N. Metaxas, and C. Chen, “A topological filter for learning with label noise,” NeurIPS, 2020.
- W. Sanford, “Applied linear regression,” John Wiley & Sons, 1985.
- Y. She and A. B. Owen, “Outlier detection using nonconvex penalized regression,” Journal of the American Statistical Association, 2011.
- J. Neyman and E. L. Scott, “Consistent estimates based on partially consistent observations,” Econometrica: Journal of the Econometric Society, 1948.
- J. Kiefer and J. Wolfowitz, “Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters,” The Annals of Mathematical Statistics, 1956.
- D. Basu, “On the elimination of nuisance parameters,” in Selected Works of Debabrata Basu, 2011.
- M. Moreira, “A maximum likelihood method for the incidental parameter problem,” National Bureau of Economic Research, Tech. Rep., 2008.
- J. Fan, R. Tang, and X. Shi, “Partial consistency with sparse incidental parameters,” Statistica Sinica, 2018.
- Y. Fu, T. M. Hospedales, T. Xiang, J. Xiong, S. Gong, Y. Wang, and Y. Yao, “Robust subjective visual property prediction from crowdsourced pairwise labels,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
- J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
- Y. Wang, C. Xu, C. Liu, L. Zhang, and Y. Fu, “Instance credibility inference for few-shot learning,” in CVPR, 2020.
- Y. Wang, L. Zhang, Y. Yao, and Y. Fu, “How to trust unlabeled data? instance credibility inference for few-shot learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- E. Simpson and I. Gurevych, “Scalable bayesian preference learning for crowds,” Machine Learning, 2020.
- Y. Wang, X. Sun, and Y. Fu, “Scalable penalized regression for noise detection in learning with noisy labels,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- M. J. Wainwright, “Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT -constrained quadratic programming (lasso),” IEEE transactions on information theory, 2009.
- P. Zhao and B. Yu, “On model selection consistency of lasso,” Journal of Machine learning research, 2006.
- R. F. Barber and E. J. Candès, “Controlling the false discovery rate via knockoffs,” The Annals of Statistics, vol. 43, no. 5, pp. 2055–2085, 2015.
- R. Dai and R. Barber, “The knockoff filter for fdr control in group-sparse and multitask regression,” in International conference on machine learning. PMLR, 2016, pp. 1851–1859.
- R. F. Barber and E. J. Candès, “A knockoff filter for high-dimensional selective inference,” The Annals of Statistics, vol. 47, no. 5, pp. 2504 – 2537, 2019. [Online]. Available: https://doi.org/10.1214/18-AOS1755
- Y. Cao, X. Sun, and Y. Yao, “Controlling the false discovery rate in transformational sparsity: Split knockoffs,” Journal of the Royal Statistical Society Series B: Statistical Methodology, p. qkad126, 11 2023.
- S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in ICCV, 2019.
- N. Simon, J. Friedman, and T. Hastie, “A blockwise descent algorithm for group-penalized multiresponse and multinomial regression,” arXiv preprint arXiv:1311.6529, 2013.
- Q. Xu, J. Xiong, X. Cao, Q. Huang, and Y. Yao, “Evaluating visual properties via robust hodgerank,” International Journal of Computer Vision, pp. 1–22, 2021.
- Q. Xu, J. Xiong, X. Cao, and Y. Yao, “False discovery rate control and statistical quality assessment of annotators in crowdsourced ranking,” in International conference on machine learning. PMLR, 2016, pp. 1282–1291.
- X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 750–15 758.
- X. Zhou, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Asymmetric loss functions for noise-tolerant learning: Theory and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8094–8109, 2023.
- R. Tanno, A. Saeedi, S. Sankaranarayanan, D. C. Alexander, and N. Silberman, “Learning from noisy labels by regularized estimation of annotator confusion,” in CVPR, 2019.
- A. K. Menon, A. S. Rawat, S. J. Reddi, and S. Kumar, “Can gradient clipping mitigate label noise?” in ICLR, 2020.
- X. Xia, T. Liu, B. Han, C. Gong, N. Wang, Z. Ge, and Y. Chang, “Robust early-learning: Hindering the memorization of noisy labels,” in ICLR, 2021.
- X. Zhou, X. Liu, C. Wang, D. Zhai, J. Jiang, and X. Ji, “Learning with noisy labels via sparse regularization,” in ICCV, 2021.
- S. Thulasidasan, T. Bhattacharya, J. Bilmes, G. Chennupati, and J. Mohd-Yusof, “Combating label noise in deep learning using abstention,” in ICML, 2019.
- D. Tanaka, D. Ikami, T. Yamasaki, and K. Aizawa, “Joint optimization framework for learning with noisy labels,” in CVPR, 2018.
- J. Li, R. Socher, and S. C. Hoi, “Dividemix: Learning with noisy labels as semi-supervised learning,” in ICLR, 2020.
- M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in ICML, 2018.
- Y. Wang, W. Liu, X. Ma, J. Bailey, H. Zha, L. Song, and S.-T. Xia, “Iterative learning with open-set noisy labels,” in CVPR, 2018.
- K. Lee, S. Yun, K. Lee, H. Lee, B. Li, and J. Shin, “Robust inference via generative classifiers for handling noisy labels,” in ICML, 2019.
- H. Dong, Z. Sun, Y. Fu, S. Zhong, Z. Zhang, and Y.-G. Jiang, “Extreme vocabulary learning,” Frontiers of Computer Science, vol. 14, no. 6, pp. 1–12, 2020.
- A. Veit, N. Alldrin, G. Chechik, I. Krasin, A. Gupta, and S. Belongie, “Learning from noisy large-scale datasets with minimal supervision,” in CVPR, 2017.
- G. Patrini, A. Rozza, A. Krishna Menon, R. Nock, and L. Qu, “Making deep neural networks robust to label noise: A loss correction approach,” in CVPR, 2017.
- E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness, “Unsupervised label noise modeling and loss correction,” in ICML, 2019.
- A. Vahdat, “Toward robustness against label noise in training deep discriminative neural networks,” NeurIPS, 2017.
- Y. Li, J. Yang, Y. Song, L. Cao, J. Luo, and L.-J. Li, “Learning from noisy labels with distillation,” in ICCV, 2017.
- K. Yi and J. Wu, “Probabilistic end-to-end noise correction for learning with noisy labels,” in CVPR, 2019.
- J. Fan and J. Lv, “A selective overview of variable selection in high dimensional feature space,” Statistica Sinica, 2010.
- E. Candes, Y. Fan, L. Janson, and J. Lv, “Panning for gold:‘model-x’knockoffs for high dimensional controlled variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 80, no. 3, pp. 551–577, 2018.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Master’s thesis, University of Tront, 2009.
- W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool, “Webvision database: Visual learning and understanding from web data,” arXiv preprint arXiv:1708.02862, 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in AAAI, 2017.
- D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio et al., “A closer look at memorization in deep networks,” in ICML, 2017.
- S. E. Reed, H. Lee, D. Anguelov, C. Szegedy, D. Erhan, and A. Rabinovich, “Training deep neural networks on noisy labels with bootstrapping,” in ICLR (Workshop), 2015.
- E. Malach and S. Shalev-Shwartz, “Decoupling” when to update” from” how to update”,” in NeurIPS, 2017.
- X. Ma, Y. Wang, M. E. Houle, S. Zhou, S. Erfani, S. Xia, S. Wijewickrema, and J. Bailey, “Dimensionality-driven learning with noisy labels,” in ICML, 2018.
- W. Zhang, Y. Wang, and Y. Qiao, “Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7373–7382.
- J. Li, Y. Wong, Q. Zhao, and M. S. Kankanhalli, “Learning to learn from noisy labeled data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5051–5059.
- Joseph L Doob. Stochastic processes. Wiley New York, 195
- M. J. Wainwright, “Sharp thresholds for high-dimensional and noisy sparsity recovery using l1 -constrained quadratic programming (lasso),” IEEE transactions on information theory, 2009.