Contrastive-Based Deep Embeddings for Label Noise-Resilient Histopathology Image Classification (2404.07605v1)
Abstract: Recent advancements in deep learning have proven highly effective in medical image classification, notably within histopathology. However, noisy labels represent a critical challenge in histopathology image classification, where accurate annotations are vital for training robust deep learning models. Indeed, deep neural networks can easily overfit label noise, leading to severe degradations in model performance. While numerous public pathology foundation models have emerged recently, none have evaluated their resilience to label noise. Through thorough empirical analyses across multiple datasets, we exhibit the label noise resilience property of embeddings extracted from foundation models trained in a self-supervised contrastive manner. We demonstrate that training with such embeddings substantially enhances label noise robustness when compared to non-contrastive-based ones as well as commonly used noise-resilient methods. Our results unequivocally underline the superiority of contrastive learning in effectively mitigating the label noise challenge. Code is publicly available at https://github.com/LucasDedieu/NoiseResilientHistopathology.
- Lung and colon cancer histopathological image dataset (lc25000), 2019.
- Computational pathology at health system scale – self-supervised foundation models from three billion images, 9 2023.
- Unsupervised learning of visual features by contrasting cluster assignments. 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- A general-purpose self-supervised model for computational pathology, 2023.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML’20, 2020a.
- An empirical study of training self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9620–9629, 2021.
- Improved baselines with momentum contrastive learning. ArXiv, abs/2003.04297, 2020b.
- Heather Dawson. Digital pathology – rising to the challenge. Frontiers in Medicine, page 888896, 2022.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Rudolfv: A foundation model by pathologists for pathologists, 0 2024.
- Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA, 318(22):2199–2210, 2017.
- Scaling self-supervised learning for histopathology with masked image modeling. medRxiv, 2023.
- The american society for clinical pathology’s job satisfaction, well-being, and burnout survey of pathologists. American journal of clinical pathology, pages 435–448, 2020.
- Robust loss functions under label noise for deep neural networks. In AAAI Conference on Artificial Intelligence, 2017.
- Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 8536–8546, 2018.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Gashissdb: A new gastric histopathology image dataset for computer aided diagnosis of gastric cancer. Computers in Biology and Medicine, page 105207, 2022.
- Pathoduet: Foundation models for pathological slide analysis of h&e and ihc stains, 2023.
- Label-noise-tolerant medical image classification via self-attention and self-supervised learning. arXiv preprint arXiv:12306.09718, 2023.
- Benchmarking self-supervised learning on diverse pathology datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3344–3354, 2023.
- 100,000 histological images of human colorectal cancer and healthy tissue, May 2018.
- Improving medical image classification in noisy labels using only self-supervised pretraining. In DEMI@MICCAI, 2023.
- Paip 2019: Liver cancer segmentation challenge. Medical Image Analysis, 67:101854, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. 2021.
- Improved histology image classification under label noise via feature aggregating memory banks. In 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), 2022.
- Cleannet: Transfer learning for scalable image classifier training with label noise. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5447–5456, 2017.
- Learning to learn from noisy labeled data. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5046–5054, 2018.
- Dividemix: Learning with noisy labels as semi-supervised learning. ICLR 2020, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, 2021.
- SGDR: stochastic gradient descent with warm restarts. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
- Towards a visual-language foundation model for computational pathology. ArXiv, abs/2307.12914, 2023.
- Normalized loss functions for deep learning with noisy labels. ICML2020, 2020.
- Learning with confident examples: Rank pruning for robust classification with noisy labels. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI’17, 2017.
- BACH Dataset : Grand Challenge on Breast Cancer Histology images, January 2020.
- Herbert E. Robbins. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.
- Co-learning: Learning from noisy labels with self-supervision. Proceedings of the 29th ACM International Conference on Multimedia, 2021.
- Rotation equivariant CNNs for digital pathology. June 2018.
- Virchow: A Million-Slide Digital Pathology Foundation Model. arXiv e-prints, 2023.
- Transpath: Transformer-based self-supervised learning for histopathological image classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 186–195. Springer, 2021.
- Transformer-based unsupervised contrastive learning for histopathological image classification. Medical Image Analysis, 81:102559, 2022.
- Retccl: Clustering-guided contrastive learning for whole-slide image retrieval. Medical Image Analysis, 83:102645, 2023.
- A petri dish for histopathology image analysis. In International Conference on Artificial Intelligence in Medicine, pages 11–24. Springer, 2021.
- The cancer genome atlas pan-cancer analysis project. In Nature genetics, page 1113–1120, 2013.
- Investigating why contrastive learning benefits robustness against label noise. In Proceedings of the 39th International Conference on Machine Learning, volume 162, 2022.
- How does disagreement help generalization against label corruption? In International Conference on Machine Learning, 2019.
- Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Proceedings of Machine Learning Research, 2021.
- Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
- Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 8792–8802, 2018.
- ibot: Image bert pre-training with online tokenizer. International Conference on Learning Representations (ICLR), 2022.
- Hard sample aware noise robust learning for histopathology image classification. IEEE transactions on medical imaging, 2022.