An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets (2312.02200v1)
Abstract: Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved data-centric methods for cleaning real world vision datasets, we first conduct more than 200 experiments carefully benchmarking recently developed automated mislabel detection methods on multiple datasets under a variety of synthetic and real noise settings with varying noise levels. We compare these methods to a Simple and Efficient Mislabel Detector (SEMD) that we craft, and find that SEMD performs similarly to or outperforms prior mislabel detection approaches. We then apply SEMD to multiple real world computer vision datasets and test how dataset size, mislabel removal strategy, and mislabel removal amount further affect model performance after retraining on the cleaned data. With careful design of the approach, we find that mislabel removal leads per-class performance improvements of up to 8% of a retrained classifier in smaller data regimes.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Understanding and utilizing deep neural networks trained with noisy labels. In International Conference on Machine Learning, pages 1062–1070. PMLR, 2019.
- Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347, 2020.
- Mitigating memorization of noisy labels via regularization between representations. arXiv e-prints, pages arXiv–2110, 2021.
- Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. Advances in Neural Information Processing Systems, 35:197–211, 2022.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- A survey of vision-language pre-trained models. arXiv preprint arXiv:2202.10936, 2022.
- Deep self-learning from noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5138–5147, 2019.
- Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3942–3951, 2021.
- Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):590–597, 2019.
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning, pages 2304–2313. PMLR, 2018.
- Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Learning multiple layers of features from tiny images. 2009.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. ICLR, 2022.
- Cleannet: Transfer learning for scalable image classifier training with label noise. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5447–5456, 2018.
- Surgical fine-tuning improves adaptation to distribution shifts. arXiv preprint arXiv:2210.11466, 2022.
- Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In International conference on artificial intelligence and statistics, pages 4313–4324. PMLR, 2020.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33:20331–20342, 2020.
- Peer loss functions: Learning from noisy labels without knowing noise rates. In International conference on machine learning, pages 6226–6236. PMLR, 2020.
- Characterizing datapoints via second-split forgetting. Advances in Neural Information Processing Systems, 35:30044–30057, 2022.
- A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial intelligence review, 33:275–306, 2010.
- Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research, 70:1373–1411, 2021a.
- Pervasive label errors in test sets destabilize machine learning benchmarks. arXiv preprint arXiv:2103.14749, 2021b.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
- Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems, 33:17044–17056, 2020.
- Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems, 33:19920–19930, 2020.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694, 2017.
- Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Advances in neural information processing systems, 29, 2016.
- Cleanlab: Confident Learning with Noisy Labels in Python. https://github.com/cleanlab/cleanlab, 2023. Accessed: 2023-11-17.
- Impact of noise in dataset on machine learning algorithms. In Machine Learning Research, pages 0–8, 2019.
- Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
- Dataset cartography: Mapping and diagnosing datasets with training dynamics. CoRR, abs/2009.10795, 2020.
- Identifying incorrect annotations in multi-label classification data. arXiv preprint arXiv:2211.13895, 2022.
- When does dough become a bagel? analyzing the remaining mistakes on imagenet. Advances in Neural Information Processing Systems, 35:6720–6734, 2022.
- Learning with noisy labels revisited: A study using real-world human annotations. In International Conference on Learning Representations, 2022.
- Robust early-learning: Hindering the memorization of noisy labels. In International conference on learning representations, 2021.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Meter-ml: A multi-sensor earth observation benchmark for automated methane source mapping. arXiv preprint arXiv:2207.11166, 2022a.
- Class noise vs. attribute noise: A quantitative study. The Artificial Intelligence Review, 22(3):177, 2004.
- Detecting corrupted labels without training a model to predict. In International Conference on Machine Learning, pages 27412–27427. PMLR, 2022b.