CSOT: Curriculum and Structure-Aware Optimal Transport for Learning with Noisy Labels (2312.06221v1)
Abstract: Learning with noisy labels (LNL) poses a significant challenge in training a well-generalized model while avoiding overfitting to corrupted labels. Recent advances have achieved impressive performance by identifying clean labels and correcting corrupted labels for training. However, the current approaches rely heavily on the model's predictions and evaluate each sample independently without considering either the global and local structure of the sample distribution. These limitations typically result in a suboptimal solution for the identification and correction processes, which eventually leads to models overfitting to incorrect labels. In this paper, we propose a novel optimal transport (OT) formulation, called Curriculum and Structure-aware Optimal Transport (CSOT). CSOT concurrently considers the inter- and intra-distribution structure of the samples to construct a robust denoising and relabeling allocator. During the training process, the allocator incrementally assigns reliable labels to a fraction of the samples with the highest confidence. These labels have both global discriminability and local coherence. Notably, CSOT is a new OT formulation with a nonconvex objective function and curriculum constraints, so it is not directly compatible with classical OT solvers. Here, we develop a lightspeed computational method that involves a scaling iteration within a generalized conditional gradient framework to solve CSOT efficiently. Extensive experiments demonstrate the superiority of our method over the current state-of-the-arts in LNL. Code is available at https://github.com/changwxx/CSOT-for-LNL.
- Structured optimal transport. In International conference on artificial intelligence and statistics, pages 1771–1780. PMLR, 2018.
- Co-pilot: Collaborative planning and reinforcement learning on sub-task curriculum. Advances in Neural Information Processing Systems, 34:10444–10456, 2021.
- Unsupervised label noise modeling and loss correction. In International conference on machine learning, pages 312–321. PMLR, 2019.
- A closer look at memorization in deep networks. In International conference on machine learning, pages 233–242. PMLR, 2017.
- Self-labelling via simultaneous clustering and representation learning. In International Conference on Learning Representations, 2019.
- Deep k-nn for noisy labels. In International Conference on Machine Learning, pages 540–550. PMLR, 2020.
- Dykstras algorithm with bregman projections: A convergence proof. Optimization, 48(4):409–427, 2000.
- Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
- Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48(3):334–334, 1997.
- Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- Unified optimal transport framework for universal domain adaptation. Advances in Neural Information Processing Systems, 35:29512–29524, 2022.
- Partial optimal tranport with applications on positive-unlabeled learning. Advances in Neural Information Processing Systems, 33:2903–2913, 2020.
- Understanding and utilizing deep neural networks trained with noisy labels. In International Conference on Machine Learning, pages 1062–1070. PMLR, 2019.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021.
- Infoot: Information maximizing optimal transport. In International Conference on Machine Learning, pages 6228–6242. PMLR, 2023.
- Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1853–1865, 2017.
- Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 113–123, 2019.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26:2292–2300, 2013.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Richard L Dykstra. An algorithm for restricted least squares regression. Journal of the American Statistical Association, 78(384):837–842, 1983.
- Consistency regularization can improve robustness to label noise. In International Conference on Machine Learning Workshops, 2021 Workshop on Uncertainty and Robustness in Deep Learning, 2021.
- Ot-filter: An optimal transport filter for learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16164–16174, 2023.
- Regularized discrete optimal transport. SIAM Journal on Imaging Sciences, 7(3):1853–1882, 2014.
- A unified objective for novel class discovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9284–9292, 2021.
- Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021.
- An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2):95–110, 1956.
- Learning to re-weight examples with optimal transport for imbalanced classification. Advances in Neural Information Processing Systems, 35:25517–25530, 2022.
- Class-imbalanced semi-supervised learning with adaptive thresholding. In International Conference on Machine Learning, pages 8082–8094. PMLR, 2022.
- Curriculumnet: Weakly supervised learning from large-scale web images. In Proceedings of the European conference on computer vision, pages 135–150, 2018.
- Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems, 31, 2018.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645. Springer, 2016.
- Using trusted data to train deep networks on labels corrupted by severe noise. Advances in neural information processing systems, 31, 2018.
- Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In International conference on machine learning, pages 427–435. PMLR, 2013.
- Beyond synthetic noise: Deep learning on controlled noisy labels. In International conference on machine learning, pages 4804–4815. PMLR, 2020.
- Self-paced curriculum learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning, pages 2304–2313. PMLR, 2018.
- Leonid V Kantorovich. On the translocation of masses. Journal of mathematical sciences, 133(4):1381–1382, 2006.
- Unicon: Combating label noise through uniform selection and contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9676–9686, 2022.
- How do humans teach: On curriculum learning and teaching dimension. Advances in neural information processing systems, 24, 2011.
- Learning multiple layers of features from tiny images. 2009.
- Sar: Self-adaptive refinement on pseudo labels for multiclass-imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4091–4100, 2022.
- Neighborhood collective estimation for noisy label identification and correction. In European Conference on Computer Vision, pages 128–145. Springer, 2022.
- Dividemix: Learning with noisy labels as semi-supervised learning. In International Conference on Learning Representations, 2019.
- Learning to learn from noisy labeled data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5051–5059, 2019.
- Learning from noisy data with robust representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9485–9494, 2021.
- Webvision database: Visual learning and understanding from web data. CoRR, 2017.
- Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33:20331–20342, 2020.
- Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11):2160–2173, 2011.
- Decoupling" when to update" from" how to update". Advances in neural information processing systems, 30, 2017.
- Curriculum learning for reinforcement learning domains: A framework and survey. The Journal of Machine Learning Research, 21(1):7382–7431, 2020.
- Confident sinkhorn allocation for pseudo-labeling. arXiv preprint arXiv:2206.05880, 2022.
- Multi-objective interpolation training for robustness to label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6606–6615, 2021.
- Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1944–1952, 2017.
- A study of gaussian mixture models of color and texture features for image classification and segmentation. Pattern recognition, 39(4):695–706, 2006.
- Gromov-wasserstein averaging of kernel and distance matrices. In International conference on machine learning, pages 2664–2672. PMLR, 2016.
- Generalized conditional gradient: analysis of convergence and applications. arXiv preprint arXiv:1510.06567, 2015.
- Learning to reweight examples for robust deep learning. In International conference on machine learning, pages 4334–4343. PMLR, 2018.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596–608, 2020.
- Selfie: Refurbishing unclean samples for robust deep learning. In International Conference on Machine Learning, pages 5907–5915. PMLR, 2019.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Sinkhorn label allocation: Semi-supervised classification via annealed self-training. In International Conference on Machine Learning, pages 10065–10075. PMLR, 2021.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Fused gromov-wasserstein distance for structured objects. Algorithms, 13(9):212, 2020.
- Solar: Sinkhorn label refinery for imbalanced partial-label learning. Advances in Neural Information Processing Systems, 35:8104–8117, 2022.
- Freematch: Self-adaptive thresholding for semi-supervised learning. International Conference on Learning Representations, 2022.
- Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 322–330, 2019.
- Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13726–13735, 2020.
- Ngc: A unified framework for learning with open-world noisy data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 62–71, 2021.
- Ot cleaner: Label correction as optimal transport. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3953–3957. IEEE, 2022.
- Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2691–2699, 2015.
- Dash: Semi-supervised learning with dynamic thresholding. In International Conference on Machine Learning, pages 11525–11536. PMLR, 2021.
- Probabilistic end-to-end noise correction for learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7017–7025, 2019.
- How does disagreement help generalization against label corruption? In International Conference on Machine Learning, pages 7164–7173. PMLR, 2019.
- Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. Advances in Neural Information Processing Systems, 34:18408–18419, 2021.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Dualgraph: A graph-based method for reasoning about label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9654–9663, 2021.
- mixup: Beyond empirical risk minimization. International Conference on Learning Representations, 2017.
- Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018.
- Group-aware label transfer for domain adaptive person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5310–5319, 2021.
- Curriculum learning by optimizing learning dynamics. In International Conference on Artificial Intelligence and Statistics, pages 433–441. PMLR, 2021.
- Robust curriculum learning: from clean label detection to noisy label self-correction. In International Conference on Learning Representations, 2021.