Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise (2301.01054v2)
Abstract: In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
- Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, February 2017. ISSN 1476-4687. doi:10.1038/nature21056.
- Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology: Official Journal of the European Society for Medical Oncology, 29(8):1836–1842, August 2018. ISSN 1569-8041. doi:10.1093/annonc/mdy166.
- Superior skin cancer classification by the combination of human and artificial intelligence. European Journal of Cancer, 120:114–121, October 2019. ISSN 0959-8049. doi:10.1016/j.ejca.2019.07.019.
- The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence, 1(1):20–23, January 2019. ISSN 2522-5839. doi:10.1038/s42256-018-0004-1.
- Second opinion needed: Communicating uncertainty in medical machine learning. npj Digital Medicine, 4(1):1–6, January 2021. ISSN 2398-6352. doi:10.1038/s41746-020-00367-3.
- Deep learning in histopathology: The path to the clinic. Nature Medicine, 27(5):775–784, May 2021. ISSN 1546-170X. doi:10.1038/s41591-021-01343-4.
- On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, pages 1321–1330. PMLR, July 2017.
- Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks. In NeurIPS 2021, page 15, 2021.
- A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification. In ICLR 2023, February 2023.
- Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In ICLR 2019, page 16, 2019.
- Can you trust your model’ s uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Measuring Domain Shift for Deep Learning in Histopathology. IEEE Journal of Biomedical and Health Informatics, 25(2):325–336, February 2021. ISSN 2168-2208. doi:10.1109/JBHI.2020.3032060.
- Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Medical Image Analysis, 58:101544, December 2019. ISSN 1361-8415. doi:10.1016/j.media.2019.101544.
- Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65:101759, October 2020. ISSN 1361-8415. doi:10.1016/j.media.2020.101759.
- Efficient Out-of-Distribution Detection in Digital Pathology Using Multi-Head Convolutional Neural Networks. In Proceedings of the Third Conference on Medical Imaging with Deep Learning, pages 465–478. PMLR, September 2020.
- Can You Trust Predictive Uncertainty Under Real Dataset Shifts in Digital Pathology? In Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, and Leo Joskowicz, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, Lecture Notes in Computer Science, pages 824–833, Cham, 2020. Springer International Publishing. ISBN 978-3-030-59710-8. doi:10.1007/978-3-030-59710-8_80.
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of The 33rd International Conference on Machine Learning, pages 1050–1059. PMLR, June 2016.
- Predictive uncertainty estimation for out-of-distribution detection in digital pathology. Medical Image Analysis, 83:102655, January 2023. ISSN 1361-8415. doi:10.1016/j.media.2022.102655.
- Weight Uncertainty in Neural Networks. In Proceedings of the 32nd International Conference on Machine Learning, pages 1613–1622. PMLR, June 2015.
- Test-time Data Augmentation for Estimation of Heteroscedastic Aleatoric Uncertainty in Deep Neural Networks. In Proceedings of the First Conference on Medical Imaging with Deep Learning, page 9, 2018.
- From Detection of Individual Metastases to Classification of Lymph Node Status at the Patient Level: The CAMELYON17 Challenge. IEEE Transactions on Medical Imaging, 38(2):550–560, February 2019. ISSN 1558-254X. doi:10.1109/TMI.2018.2867350.
- Comparative Molecular Analysis of Gastrointestinal Adenocarcinomas. Cancer Cell, 33(4):721–735.e8, April 2018. ISSN 1535-6108. doi:10.1016/j.ccell.2018.03.010.
- Alex Graves. Practical Variational Inference for Neural Networks. In Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
- Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518):859–877, April 2017. ISSN 0162-1459. doi:10.1080/01621459.2017.1285773.
- Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches. In ICLR 2018, 2018.
- Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning. In ICLR 2020, page 30, 2020.
- Risk-Aware Machine Learning Classifier for Skin Lesion Diagnosis. Journal of Clinical Medicine, 8(8):1241, August 2019. ISSN 2077-0383. doi:10.3390/jcm8081241.
- Exploring uncertainty measures in deep networks for Multiple sclerosis lesion detection and segmentation. Medical Image Analysis, 59:101557, January 2020. ISSN 1361-8415. doi:10.1016/j.media.2019.101557.
- Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports, 7(1):17816, December 2017. ISSN 2045-2322. doi:10.1038/s41598-017-17876-z.
- Selective Classification for Deep Neural Networks. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option. In Proceedings of the Third International Workshop on Machine Learning in Systems Biology, pages 65–81. PMLR, March 2009.
- Measuring Calibration in Deep Learning. In CVPR Workshop, page 4, 2019.
- Inter-observer variability of manual contour delineation of structures in CT. European Radiology, 29(3):1391–1399, March 2019. ISSN 1432-1084. doi:10.1007/s00330-018-5695-5.
- Improving Uncertainty Estimation in Convolutional Neural Networks Using Inter-rater Agreement. In Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, Lecture Notes in Computer Science, pages 540–548, Cham, 2019. Springer International Publishing. ISBN 978-3-030-32251-9. doi:10.1007/978-3-030-32251-9_59.
- A generalized deep learning framework for whole-slide image segmentation and analysis. Scientific Reports, 11(1):11579, June 2021. ISSN 2045-2322. doi:10.1038/s41598-021-90444-8.
- Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nature Medicine, 25(7):1054–1056, July 2019. ISSN 1546-170X. doi:10.1038/s41591-019-0462-y.
- Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: A retrospective study. The Lancet Digital Health, 3(12):e763–e772, December 2021. ISSN 2589-7500. doi:10.1016/S2589-7500(21)00180-1.
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, Las Vegas, NV, USA, June 2016. IEEE. ISBN 978-1-4673-8851-1. doi:10.1109/CVPR.2016.90.
- Adam: A Method for Stochastic Optimization. In ICLR 2015. arXiv, January 2017.
- How Good is the Bayes Posterior in Deep Neural Networks Really? In ICML 2020, July 2020.
- Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, pages 2623–2631, New York, NY, USA, July 2019. Association for Computing Machinery. ISBN 978-1-4503-6201-6. doi:10.1145/3292500.3330701.
- Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering, 5(6):555–570, June 2021. ISSN 2157-846X. doi:10.1038/s41551-020-00682-w.
- ImageNet: A large-scale hierarchical image database. In CVPR 2009, pages 248–255, June 2009. doi:10.1109/CVPR.2009.5206848.
- Attention-based Deep Multiple Instance Learning. In Proceedings of the 35th International Conference on Machine Learning, pages 2127–2136. PMLR, July 2018.
- Expert-validated estimation of diagnostic uncertainty for deep neural networks in diabetic retinopathy detection. Medical Image Analysis, 64:101724, August 2020. ISSN 1361-8415. doi:10.1016/j.media.2020.101724.
- Hendrik A. Mehrtens (3 papers)
- Alexander Kurz (41 papers)
- Tabea-Clara Bucher (7 papers)
- Titus J. Brinker (15 papers)