Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 188 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Group Benefits Instances Selection for Data Purification (2403.15694v1)

Published 23 Mar 2024 in cs.LG and cs.MM

Abstract: Manually annotating datasets for training deep models is very labor-intensive and time-consuming. To overcome such inferiority, directly leveraging web images to conduct training data becomes a natural choice. Nevertheless, the presence of label noise in web data usually degrades the model performance. Existing methods for combating label noise are typically designed and tested on synthetic noisy datasets. However, they tend to fail to achieve satisfying results on real-world noisy datasets. To this end, we propose a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets. Specifically, GRIP utilizes a group regularization strategy that estimates class soft labels to improve noise robustness. Soft label supervision reduces overfitting on noisy labels and learns inter-class similarities to benefit classification. Furthermore, an instance purification operation globally identifies noisy labels by measuring the difference between each training sample and its class soft label. Through operations at both group and instance levels, our approach integrates the advantages of noise-robust and noise-cleaning methods and remarkably alleviates the performance degradation caused by noisy labels. Comprehensive experimental results on synthetic and real-world datasets demonstrate the superiority of GRIP over the existing state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. G. Pei, F. Shen, Y. Yao, G.-S. Xie, Z. Tang, and J. Tang, “Hierarchical feature alignment network for unsupervised video object segmentation,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 596–613.
  2. S.-H. Wang, D. R. Nayak, D. S. Guttery, X. Zhang, and Y.-D. Zhang, “Covid-19 classification by ccshnet with deep fusion using transfer learning and discriminant correlation analysis,” Information Fusion, vol. 68, pp. 131–148, 2021.
  3. S. Wang, M. E. Celebi, Y.-D. Zhang, X. Yu, S. Lu, X. Yao, Q. Zhou, M.-G. Miguel, Y. Tian, J. M. Gorriz et al., “Advances in data preprocessing for biomedical data fusion: An overview of the methods, challenges, and prospects,” Information Fusion, vol. 76, pp. 376–421, 2021.
  4. Y.-D. Zhang, Z. Dong, S.-H. Wang, X. Yu, X. Yao, Q. Zhou, H. Hu, M. Li, C. Jiménez-Mesa, J. Ramirez et al., “Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation,” Information Fusion, vol. 64, pp. 149–187, 2020.
  5. H. Zhu, S. Liu, L. Deng, Y. Li, and F. Xiao, “Infrared small target detection via low-rank tensor completion with top-hat regularization,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 2, pp. 1004–1016, 2019.
  6. L. Deng, H. Zhu, Q. Zhou, and Y. Li, “Adaptive top-hat filter based on quantum genetic algorithm for infrared small target detection,” Multimedia Tools and Applications, vol. 77, pp. 10 539–10 551, 2018.
  7. H. Zhu, H. Ni, S. Liu, G. Xu, and L. Deng, “Tnlrs: Target-aware non-local low-rank modeling with saliency filtering regularization for infrared small target detection,” IEEE Transactions on Image Processing, vol. 29, pp. 9546–9558, 2020.
  8. L. Deng, J. Zhang, G. Xu, and H. Zhu, “Infrared small target detection via adaptive m-estimator ring top-hat transformation,” Pattern Recognition, vol. 112, p. 107729, 2021.
  9. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
  10. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Proceedings of the European Conference on Computer Vision.   Springer, 2014, pp. 740–755.
  11. Y. Yao, T. Chen, G.-S. Xie, C. Zhang, F. Shen, Q. Wu, Z. Tang, and J. Zhang, “Non-salient region object mining for weakly supervised semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2623–2632.
  12. T. Chen, Y. Yao, and J. Tang, “Multi-granularity denoising and bidirectional alignment for weakly supervised semantic segmentation,” IEEE Transactions on Image Processing, vol. 32, pp. 2960–2971, 2023.
  13. H. Liu, P. Peng, T. Chen, Q. Wang, Y. Yao, and X.-S. Hua, “Fecanet: Boosting few-shot semantic segmentation with feature-enhanced context-aware network,” IEEE Transactions on Multimedia, pp. 1–13, 2023.
  14. Q. Tian, Y. Cheng, S. He, and J. Sun, “Unsupervised multi-source domain adaptation for person re-identification via feature fusion and pseudo-label refinement,” Computers and Electrical Engineering, vol. 113, p. 109029, 2024.
  15. Y. Yao, J. Zhang, F. Shen, X. Hua, J. Xu, and Z. Tang, “Exploiting web images for dataset construction: A domain robust approach,” IEEE Transactions on Multimedia, vol. 19, no. 8, pp. 1771–1784, 2017.
  16. Y. Yao, J. Zhang, F. Shen, L. Liu, F. Zhu, D. Zhang, and H. T. Shen, “Towards automatic construction of diverse, high-quality image datasets,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp. 1199–1211, 2019.
  17. Y. Yao, F. Shen, G. Xie, L. Liu, F. Zhu, J. Zhang, and H. T. Shen, “Exploiting web images for multi-output classification: From category to subcategories,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 7, pp. 2348–2360, 2020.
  18. Y. Yao, Z. Sun, F. Shen, L. Liu, L. Wang, F. Zhu, L. Ding, G. Wu, and L. Shao, “Dynamically visual disambiguation of keyword-based image search,” 2019, pp. 996–1002.
  19. T. Xiao, T. Xia, Y. Yang, C. Huang, and X. Wang, “Learning from massive noisy labeled data for image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2691–2699.
  20. Z. Sun, Y. Yao, X.-S. Wei, Y. Zhang, F. Shen, J. Wu, J. Zhang, and H. T. Shen, “Webly supervised fine-grained recognition: Benchmark datasets and an approach,” in Proceedings of the International Conference on Computer Vision, 2021, pp. 10 602–10 611.
  21. B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li, “Yfcc100m: The new data in multimedia research,” Communications of the ACM, vol. 59, no. 2, pp. 64–73, 2016.
  22. S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, “Youtube-8m: A large-scale video classification benchmark,” arXiv preprint arXiv:1609.08675, 2016.
  23. D. Arpit, S. Jastrzębski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio et al., “A closer look at memorization in deep networks,” in Proceedings of the International Conference on Machine Learning, 2017, pp. 233–242.
  24. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” in Proceedings of the International Conference on Learning Representations, 2016, pp. 1–15.
  25. Z. Sun, F. Shen, D. Huang, Q. Wang, X. Shu, Y. Yao, and J. Tang, “Pnp: Robust learning from noisy labels by probabilistic noise prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 5311–5320.
  26. T. Liu and D. Tao, “Classification with noisy labels by importance reweighting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 447–461, 2015.
  27. C. Zhang, Q. Wang, G. Xie, Q. Wu, F. Shen, and Z. Tang, “Robust learning from noisy web images via data purification for fine-grained recognition,” IEEE Transactions on Multimedia, vol. 24, pp. 1198–1209, 2021.
  28. C. Zhang, Y. Yao, X. Shu, Z. Li, Z. Tang, and Q. Wu, “Data-driven meta-set based fine-grained visual recognition,” in Proceedings of the ACM International Conference on Multimedia, 2020, pp. 2372–2381.
  29. H. Liu, C. Zhang, Y. Yao, X.-S. Wei, F. Shen, Z. Tang, and J. Zhang, “Exploiting web images for fine-grained visual recognition by eliminating open-set noise and utilizing hard examples,” IEEE Transactions on Multimedia, vol. 24, pp. 546–557, 2021.
  30. C. Zhang, G. Lin, Q. Wang, F. Shen, Y. Yao, and Z. Tang, “Guided by meta-set: A data-driven method for fine-grained visual recognition,” IEEE Transactions on Multimedia, pp. 4691–4703, 2022.
  31. M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in Proceedings of the International Conference on Machine Learning.   PMLR, 2018, pp. 4334–4343.
  32. J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng, “Meta-weight-net: Learning an explicit mapping for sample weighting,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  33. M. Sheng, Z. Sun, Z. Cai, T. Chen, Y. Zhou, and Y. Yao, “Adaptive integration of partial label learning and negative learning for enhanced noisy label learning,” arXiv preprint arXiv:2312.09505, 2023.
  34. Y. Wang, X. Ma, Z. Chen, Y. Luo, J. Yi, and J. Bailey, “Symmetric cross entropy for robust learning with noisy labels,” in Proceedings of the International Conference on Computer Vision, 2019, pp. 322–330.
  35. X. Ma, H. Huang, Y. Wang, S. Romano, S. Erfani, and J. Bailey, “Normalized loss functions for deep learning with noisy labels,” in Proceedings of the International Conference on Machine Learning.   PMLR, 2020, pp. 6543–6553.
  36. Z. Zhang and M. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  37. X. Xia, T. Liu, B. Han, C. Gong, N. Wang, Z. Ge, and Y. Chang, “Robust early-learning: Hindering the memorization of noisy labels,” in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–15.
  38. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
  39. C.-B. Zhang, P.-T. Jiang, Q. Hou, Y. Wei, Q. Han, Z. Li, and M.-M. Cheng, “Delving deep into label smoothing,” IEEE Transactions on Image Processing, pp. 5984–5996, 2021.
  40. J. Goldberger and E. Ben-Reuven, “Training deep neural-networks using a noise adaptation layer,” 2016.
  41. J. Li, R. Socher, and S. C. Hoi, “Dividemix: Learning with noisy labels as semi-supervised learning,” in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–14.
  42. E. Malach and S. Shalev-Shwartz, “Decoupling" when to update" from" how to update",” Advances in Neural Information Processing Systems, vol. 30, 2017.
  43. B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  44. H. Wei, L. Feng, X. Chen, and B. An, “Combating noisy labels by agreement: A joint training method with co-regularization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 726–13 735.
  45. H. Liu, H. Zhang, J. Lu, and Z. Tang, “Exploiting web images for fine-grained visual recognition via dynamic loss correction and global sample selection,” IEEE Transactions on Multimedia, vol. 24, pp. 1105–1115, 2022.
  46. Z. Cai, G.-S. Xie, X. Huang, D. Huang, Y. Yao, and Z. Tang, “Robust learning from noisy web data for fine-grained recognition,” Pattern Recognition, vol. 134, p. 109063, 2023.
  47. Z. Cai, H. Liu, D. Huang, Y. Yao, and Z. Tang, “Co-mining: Mining informative samples with noisy labels,” Signal Processing, vol. 209, p. 109003, 2023.
  48. X.-J. Gui, W. Wang, and Z.-H. Tian, “Towards understanding deep learning from noisy labels with small-loss criterion,” 2021, pp. 2469–2475.
  49. H. Song, M. Kim, and J.-G. Lee, “Selfie: Refurbishing unclean samples for robust deep learning,” in Proceedings of the International Conference on Machine Learning, 2019, pp. 5907–5915.
  50. Y. Yao, Z. Sun, C. Zhang, F. Shen, Q. Wu, J. Zhang, and Z. Tang, “Jo-src: A contrastive approach for combating noisy labels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5192–5201.
  51. A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Technical report, University of Tront, vol. 1, no. 4, p. 7, 2009.
  52. X. Peng, K. Wang, Z. Zeng, Q. Li, J. Yang, and Y. Qiao, “Suppressing mislabeled data via grouping and self-attention,” in Proceedings of the European Conference on Computer Vision.   Springer, 2020, pp. 786–802.
  53. A. Ghosh, H. Kumar, and P. Sastry, “Robust loss functions under label noise for deep neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
  54. Z. Sun, X.-S. Hua, Y. Yao, X.-S. Wei, G. Hu, and J. Zhang, “Crssc: salvage reusable samples from noisy data for robust learning,” in Proceedings of the ACM International Conference on Multimedia, 2020, pp. 92–101.
  55. C. Zhang, Y. Yao, H. Liu, G.-S. Xie, X. Shu, T. Zhou, Z. Zhang, F. Shen, and Z. Tang, “Web-supervised network with softly update-drop training for fine-grained visual classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 781–12 788.
  56. C. Tan, J. Xia, L. Wu, and S. Z. Li, “Co-learning: Learning from noisy labels with self-supervision,” in Proceedings of the ACM International Conference on Multimedia, 2021, pp. 1405–1413.
  57. D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, “Mixmatch: A holistic approach to semi-supervised learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  58. A. Dubey, O. Gupta, R. Raskar, and N. Naik, “Maximum-entropy fine grained classification,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  59. J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 145–151, 1991.
  60. D. Patel and P. Sastry, “Adaptive sample selection for robust learning under label noise,” in IEEE Winter Conference on Applications of Computer Vision, 2023, pp. 3932–3942.
  61. X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, and M. Sugiyama, “How does disagreement help generalization against label corruption?” in Proceedings of the International Conference on Machine Learning, 2019, pp. 7164–7173.
  62. Y. Lu and W. He, “SELC: self-ensemble label correction improves learning with noisy labels,” in Proceedings of the International Joint Conference on Artificial Intelligence, vol. 31, 2022, pp. 3278–3284.
  63. J. Deng, J. Guo, T. Liu, M. Gong, and S. Zafeiriou, “Sub-center arcface: Boosting face recognition by large-scale noisy web faces,” in Proceedings of the European Conference on Computer Vision.   Springer, 2020, pp. 741–757.
  64. L. Huang, C. Zhang, and H. Zhang, “Self-adaptive training: Beyond empirical risk minimization,” Advances in Neural Information Processing Systems, vol. 33, 2020.
  65. Y. Zhang, S. Zheng, P. Wu, M. Goswami, and C. Chen, “Learning with feature-dependent label noise: A progressive approach,” in Proceedings of the International Conference on Learning Representations, 2021, pp. 1–13.
  66. Z. Sun, H. Liu, Q. Wang, T. Zhou, Q. Wu, and Z. Tang, “Co-ldl: A co-training-based label distribution learning method for tackling label noise,” IEEE Transactions on Multimedia, vol. 24, pp. 1093–1104, 2022.
  67. X. Zhou, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Asymmetric loss functions for noise-tolerant learning: theory and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, pp. 8094–8109, 2023.
  68. J. Shu, X. Yuan, D. Meng, and Z. Xu, “Cmw-net: Learning a class-aware sample weighting mapping for robust deep learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  69. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” 2011.
  70. S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” arXiv preprint arXiv:1306.5151, 2013.
  71. J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proceedings of the International Conference on Computer Vision, 2013, pp. 554–561.
  72. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  73. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” in Proceedings of the International Conference on Learning Representations, 2016, pp. 1–16.
  74. L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of the International Conference on Computational Statistics, 2010, p. 177.
  75. H. Permuter, J. Francos, and I. Jermyn, “A study of gaussian mixture models of color and texture features for image classification and segmentation,” Pattern Recognition, pp. 695–706, 2006.
  76. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: