Extracting Clean and Balanced Subset for Noisy Long-tailed Classification (2404.06795v1)
Abstract: Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching, which can be solved with optimal transport (OT). By setting a manually-specific probability measure and using a learned transport plan to pseudo-label the training samples, the proposed method can reduce the side-effects of noisy and long-tailed data simultaneously. Then we introduce a simple yet effective filter criteria by combining the observed labels and pseudo labels to obtain a more balanced and less noisy subset for a robust model training. Extensive experiments demonstrate that our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
- Self-labelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371, 2019.
- Asuncion, A. Uci machine learning repository, Jan 2007.
- The imbalanced training sample problem: Under or over sampling? In Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshops, SSPR 2004 and SPR 2004, Lisbon, Portugal, August 18-20, 2004 Proceedings, 2004.
- Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
- Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019.
- Heteroskedastic and imbalanced deep learning with adaptive regularization. arXiv preprint arXiv:2006.15766, 2020.
- Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), pp. 132–149, 2018.
- Unified optimal transport framework for universal domain adaptation. Advances in Neural Information Processing Systems, 35:29512–29524, 2022.
- SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res., 16:321–357, 2002. doi: 10.1613/jair.953. URL https://doi.org/10.1613/jair.953.
- Understanding and utilizing deep neural networks trained with noisy labels. In International Conference on Machine Learning, pp. 1062–1070. PMLR, 2019.
- Instance-dependent label-noise learning with manifold-regularized transition matrix estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16630–16639, 2022.
- Scaling algorithms for unbalanced optimal transport problems. Mathematics of Computation, 87(314):2563–2609, 2018.
- Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Wasserstein adversarial regularization (war) on label noise. arXiv preprint arXiv:1904.03936, 2019.
- Enhancing minority classes by mixing: An adaptative optimal transport approach for long-tailed classification. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Learning to re-weight examples with optimal transport for imbalanced classification. Advances in Neural Information Processing Systems, 35:25517–25530, 2022.
- On the power of curriculum learning in training deep networks. In International conference on machine learning, pp. 2535–2544. PMLR, 2019.
- Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems, 31, 2018.
- Sigua: Forgetting may make learning with noisy labels more robust. In International Conference on Machine Learning, pp. 4006–4016. PMLR, 2020.
- Deep self-learning from noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 5138–5147, 2019.
- Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263–1284, 2009.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
- Using trusted data to train deep networks on labels corrupted by severe noise. Advances in neural information processing systems, 31, 2018.
- Holt, C. C. Forecasting seasonals and trends by exponentially weighted moving averages. International journal of forecasting, 20(1):5–10, 2004.
- Subclass-balancing contrastive learning for long-tailed recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5395–5407, 2023.
- Simple and effective regularization methods for training on noisily labeled data with generalization guarantee. arXiv preprint arXiv:1905.11368, 2019a.
- Learning data manipulation for augmentation and weighting. Advances in Neural Information Processing Systems, 32, 2019b.
- Learning deep representation for imbalanced classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5375–5384, 2016.
- O2u-net: A simple noisy label detection approach for deep neural networks. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3326–3334, 2019.
- Uncertainty-aware learning against label noise on imbalanced datasets. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 6960–6969, 2022.
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning, pp. 2304–2313. PMLR, 2018.
- Beyond synthetic noise: Deep learning on controlled noisy labels. In International conference on machine learning, pp. 4804–4815. PMLR, 2020.
- Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217, 2019.
- Unicon: Combating label noise through uniform selection and contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9676–9686, 2022.
- Learning multiple layers of features from tiny images. 2009.
- Robust inference via generative classifiers for handling noisy labels. In International conference on machine learning, pp. 3763–3772. PMLR, 2019.
- Cleannet: Transfer learning for scalable image classifier training with label noise. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5447–5456, 2018.
- Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394, 2020a.
- Mopro: Webly supervised learning with momentum prototypes. arXiv preprint arXiv:2009.07995, 2020b.
- Learning from noisy data with robust representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9485–9494, 2021a.
- Class-balanced pixel-level self-labeling for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11593–11603, 2022a.
- Coupled-view deep classifier learning from multiple noisy annotators. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 4667–4674, 2020c.
- Metasaug: Meta semantic augmentation for long-tailed visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5212–5221, 2021b.
- Selective-supervised contrastive learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 316–325, 2022b.
- Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862, 2017a.
- Learning from noisy labels with distillation. In Proceedings of the IEEE international conference on computer vision, pp. 1910–1918, 2017b.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, 2014.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
- Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33:20331–20342, 2020.
- Classification with noisy labels by importance reweighting. IEEE Transactions on pattern analysis and machine intelligence, 38(3):447–461, 2015.
- Improving the accuracy of learning example weights for imbalance classification. In International Conference on Learning Representations, 2022.
- Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp. 3730–3738, 2015.
- Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2537–2546, 2019.
- Label-noise learning with intrinsically long-tailed data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1369–1378, 2023.
- Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314, 2020.
- Influence-balanced loss for imbalanced visual classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 735–744, 2021.
- Computational optimal transport. Center for Research in Economics and Statistics Working Papers, (2017-86), 2017.
- Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020.
- Learning to reweight examples for robust deep learning. In International conference on machine learning, pp. 4334–4343. PMLR, 2018.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- Meta-weight-net: Learning an explicit mapping for sample weighting. Advances in neural information processing systems, 32, 2019.
- Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5552–5560, 2018.
- Long-tailed recognition by routing diverse distribution-aware experts. arXiv preprint arXiv:2010.01809, 2020.
- Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 5017–5026, 2019.
- Learning to model the tail. Advances in neural information processing systems, 30, 2017.
- Robust long-tailed learning under label noise. arXiv preprint arXiv:2108.11569, 2021.
- Ngc: A unified framework for learning with open-world noisy data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 62–71, 2021.
- Are anchor points really indispensable in label-noise learning? Advances in neural information processing systems, 32, 2019.
- Robust early-learning: Hindering the memorization of noisy labels. In International conference on learning representations, 2020a.
- Part-dependent label noise: Towards instance-dependent label noise. Advances in Neural Information Processing Systems, 33:7597–7610, 2020b.
- Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2691–2699, 2015.
- A survey on long-tailed visual recognition. International Journal of Computer Vision, 130(7):1837–1872, 2022.
- Searching to exploit memorization effect in learning from corrupted labels. arXiv preprint arXiv:1911.02377, 2019.
- Identifying hard noise in long-tailed sample distribution. In European Conference on Computer Vision, pp. 739–756. Springer, 2022.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021a.
- When noisy labels meet long tail dilemmas: A representation calibration method. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15890–15900, 2023.
- Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12414–12424, 2021b.
- Learning with feature-dependent label noise: A progressive approach. arXiv preprint arXiv:2103.07756, 2021c.
- Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018.
- Contrast to divide: Self-supervised pre-training for learning with noisy labels. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1657–1667, 2022.
- Error-bounded correction of noisy labels. In International Conference on Machine Learning, pp. 11447–11457. PMLR, 2020.
- Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16489–16498, 2021.
- Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
- Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Aug 2020. doi: 10.1109/cvpr42600.2020.00974. URL http://dx.doi.org/10.1109/cvpr42600.2020.00974.
- Balanced contrastive learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6908–6917, 2022.
- Zhuo Li (164 papers)
- He Zhao (117 papers)
- Zhen Li (334 papers)
- Tongliang Liu (251 papers)
- Dandan Guo (19 papers)
- Xiang Wan (94 papers)