Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition (2307.00880v1)
Abstract: In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution. Additionally, label noise is inevitable in large-scale annotations and hinders the applications of learning-based models. Although many deep learning based methods have been proposed for handling long-tailed multi-label recognition or label noise respectively, learning with noisy labels in long-tailed multi-label visual data has not been well-studied because of the complexity of long-tailed distribution entangled with multi-label correlation. To tackle such a critical yet thorny problem, this paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases. In detail, we propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise by stitching up multiple noisy training samples. Equipped with Stitch-Up, a Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions, yielding cleaner labels for more robust representation learning with noisy long-tailed data. To validate our method, we build two challenging benchmarks, named VOC-MLT-Noise and COCO-MLT-Noise, respectively. Extensive experiments are conducted to demonstrate the effectiveness of our proposed method. Compared to a variety of baselines, our method achieves superior results.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 25, 2012, pp. 1097–1105.
- D. Zhang, J. Han, G. Cheng, and M.-H. Yang, “Weakly supervised object localization and detection: A survey,” IEEE transactions on pattern analysis and machine intelligence, 2021.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Ieee, 2009, pp. 248–255.
- B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9719–9728.
- Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu, “Large-scale long-tailed recognition in an open world,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2537–2546.
- K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label-distribution-aware margin loss,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
- B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling representation and classifier for long-tailed recognition,” in International Conference on Learning Representations, 2020.
- J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, “Cnn-rnn: A unified framework for multi-label image classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 2285–2294.
- Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5177–5186.
- Z. Ji, B. Cui, H. Li, Y.-G. Jiang, T. Xiang, T. Hospedales, and Y. Fu, “Deep ranking for image zero-shot multi-label classification,” IEEE Transactions on Image Processing, vol. 29, pp. 6549–6560, 2020.
- C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 843–852.
- X. Wu, J. Chang, Y.-K. Lai, J. Yang, and Q. Tian, “Bispl: Bidirectional self-paced learning for recognition from web data,” IEEE Transactions on Image Processing, vol. 30, pp. 6512–6527, 2021.
- M. Ye, H. Li, B. Du, J. Shen, L. Shao, and S. C. Hoi, “Collaborative refining for person re-identification with label noise,” IEEE Transactions on Image Processing, vol. 31, pp. 379–391, 2021.
- P. Huang, J. Han, N. Liu, J. Ren, and D. Zhang, “Scribble-supervised video object segmentation,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 2, pp. 339–353, 2021.
- J. Li, R. Socher, and S. C. Hoi, “Dividemix: Learning with noisy labels as semi-supervised learning,” in International Conference on Learning Representations, 2020.
- J. Li, C. Xiong, and S. Hoi, “Mopro: Webly supervised learning with momentum prototypes,” in International Conference on Learning Representations, 2021.
- T. Wu, Q. Huang, Z. Liu, Y. Wang, and D. Lin, “Distribution-balanced loss for multi-label classification in long-tailed datasets,” in Proceedings of the European conference on computer vision. Springer, 2020, pp. 162–178.
- H. Guo and S. Wang, “Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 089–15 098.
- B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc., 2018.
- M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in International Conference on Machine Learning. PMLR, 2018, pp. 4334–4343.
- J. Yang, L. Feng, W. Chen, X. Yan, H. Zheng, P. Luo, and W. Zhang, “Webly supervised image classification with self-contained confidence,” in Proceedings of the European conference on computer vision. Springer, 2020, pp. 779–795.
- X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, and M. Sugiyama, “How does disagreement help generalization against label corruption?” in International Conference on Machine Learning. PMLR, 2019, pp. 7164–7173.
- L. Shen, Z. Lin, and Q. Huang, “Relay backpropagation for effective learning of deep convolutional neural networks,” in European conference on computer vision. Springer, 2016, pp. 467–482.
- D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. Van Der Maaten, “Exploring the limits of weakly supervised pretraining,” in Proceedings of the European conference on computer vision, 2018, pp. 181–196.
- H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
- Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9268–9277.
- S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri, “Cost-sensitive learning of deep feature representations from imbalanced data,” IEEE transactions on neural networks and learning systems, vol. 29, no. 8, pp. 3573–3587, 2017.
- J. Tan, C. Wang, B. Li, Q. Li, W. Ouyang, C. Yin, and J. Yan, “Equalization loss for long-tailed object recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 662–11 671.
- Y.-X. Wang, D. Ramanan, and M. Hebert, “Learning to model the tail,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 7032–7042.
- Z. Li, K. Kamnitsas, and B. Glocker, “Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 402–410.
- Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie, “Large scale fine-grained categorization and domain-specific transfer learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 4109–4118.
- Y. Zhong, W. Deng, M. Wang, J. Hu, J. Peng, X. Tao, and Y. Huang, “Unequal-training for deep face recognition with long-tailed noisy data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7812–7821.
- M. A. Jamal, M. Brown, M.-H. Yang, L. Wang, and B. Gong, “Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7610–7619.
- C. Huang, Y. Li, C. C. Loy, and X. Tang, “Deep imbalanced learning for face recognition and attribute prediction,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 11, pp. 2781–2794, 2019.
- G. Tsoumakas and I. Katakis, “Multi-label classification: An overview,” International Journal of Data Warehousing and Mining (IJDWM), vol. 3, no. 3, pp. 1–13, 2007.
- M.-L. Zhang and Z.-H. Zhou, “Ml-knn: A lazy learning approach to multi-label learning,” Pattern recognition, vol. 40, no. 7, pp. 2038–2048, 2007.
- A. Clare and R. D. King, “Knowledge discovery in multi-label phenotype data,” in European conference on principles of data mining and knowledge discovery. Springer, 2001, pp. 42–53.
- A. Elisseeff and J. Weston, “A kernel method for multi-labelled classification,” in Advances in Neural Information Processing Systems, vol. 14, 2001, pp. 681–687.
- C.-W. Lee, W. Fang, C.-K. Yeh, and Y.-C. F. Wang, “Multi-label zero-shot learning with structured knowledge graphs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1576–1585.
- L. Wang, Y. Liu, H. Di, C. Qin, G. Sun, and Y. Fu, “Semi-supervised dual relation learning for multi-label classification,” IEEE Transactions on Image Processing, vol. 30, pp. 9125–9135, 2021.
- G.-S. Xie, X.-Y. Zhang, S. Yan, and C.-L. Liu, “Sde: A novel selective, discriminative and equalizing feature representation for visual recognition,” International Journal of Computer Vision, vol. 124, no. 2, pp. 145–168, 2017.
- Y.-P. Sun and M.-L. Zhang, “Compositional metric learning for multi-label classification,” Frontiers of Computer Science, vol. 15, pp. 1–12, 2021.
- Y. Yang, Y. Zhuang, and Y. Pan, “Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies,” Frontiers of Information Technology & Electronic Engineering, vol. 22, no. 12, pp. 1551–1558, 2021.
- C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,” Communications of the ACM, vol. 64, no. 3, pp. 107–115, 2021.
- E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness, “Unsupervised label noise modeling and loss correction,” in International Conference on Machine Learning. PMLR, 2019, pp. 312–321.
- D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio et al., “A closer look at memorization in deep networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 233–242.
- J. Shu, Q. Xie, L. Yi, Q. Zhao, S. Zhou, Z. Xu, and D. Meng, “Meta-weight-net: Learning an explicit mapping for sample weighting,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
- Y. Xu, L. Zhu, Y. Yang, and F. Wu, “Training robust object detectors from noisy category labels and imprecise bounding boxes,” IEEE Transactions on Image Processing, vol. 30, pp. 5782–5792, 2021.
- H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018.
- S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6023–6032.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
- J. Han, P. Luo, and X. Wang, “Deep self-learning from noisy labels,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5138–5147.