Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation (2306.13528v1)
Abstract: Deep Learning models perform unreliably when the data comes from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection methods help to identify such data samples, preventing erroneous predictions. In this paper, we further investigate the OOD detection effectiveness when applied to 3D medical image segmentation. We design several OOD challenges representing clinically occurring cases and show that none of these methods achieve acceptable performance. Methods not dedicated to segmentation severely fail to perform in the designed setups; their best mean false positive rate at 95% true positive rate (FPR) is 0.59. Segmentation-dedicated ones still achieve suboptimal performance, with the best mean FPR of 0.31 (lower is better). To indicate this suboptimality, we develop a simple method called Intensity Histogram Features (IHF), which performs comparable or better in the same challenges, with a mean FPR of 0.25. Our findings highlight the limitations of the existing OOD detection methods on 3D medical images and present a promising avenue for improving them. To facilitate research in this area, we release the designed challenges as a publicly available benchmark and formulate practical criteria to test the OOD detection generalization beyond the suggested benchmark. We also propose IHF as a solid baseline to contest the emerging methods.
- M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, 2018.
- B. Kompa, J. Snoek, and A. L. Beam, “Second opinion needed: communicating uncertainty in medical machine learning,” NPJ Digital Medicine, vol. 4, no. 1, p. 4, 2021.
- J. Yang, K. Zhou, Y. Li, and Z. Liu, “Generalized out-of-distribution detection: A survey,” arXiv preprint arXiv:2110.11334, 2021.
- D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” arXiv preprint arXiv:1610.02136, 2016.
- D. Hendrycks, S. Basart, M. Mazeika, M. Mostajabi, J. Steinhardt, and D. Song, “Scaling out-of-distribution detection for real-world settings,” arXiv preprint arXiv:1911.11132, 2019.
- A. Mahmood, J. Oliva, and M. Styner, “Multiscale score matching for out-of-distribution detection,” arXiv preprint arXiv:2010.13132, 2020.
- A. G. Pacheco, C. S. Sastry, T. Trappenberg, S. Oore, and R. A. Krohling, “On out-of-distribution detection algorithms with deep neural skin cancer classifiers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 732–733.
- C. Berger, M. Paschali, B. Glocker, and K. Kamnitsas, “Confidence-based out-of-distribution detection: a comparative study and analysis,” in Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis. Springer, 2021, pp. 122–132.
- T. Cao, C.-W. Huang, D. Y.-T. Hui, and J. P. Cohen, “A benchmark of medical out of distribution detection,” arXiv preprint arXiv:2007.04250, 2020.
- G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical image analysis, vol. 42, pp. 60–88, 2017.
- D. Karimi and A. Gholipour, “Improving calibration and out-of-distribution detection in deep models for medical image segmentation,” IEEE Transactions on Artificial Intelligence, 2022.
- D. Zimmerer, J. Petersen, G. Köhler, P. Jäger, P. Full, K. Maier-Hein, T. Roß, T. Adler, A. Reinke, and L. Maier-Hein, “Medical Out-of-Distribution Analysis Challenge 2022,” Mar. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6362313
- B. Lambert, F. Forbes, S. Doyle, A. Tucholka, and M. Dojat, “Improving uncertainty-based out-of-distribution detection for medical image segmentation,” arXiv preprint arXiv:2211.05421, 2022.
- S. G. Armato III, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoffman et al., “The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans,” Medical physics, vol. 38, no. 2, pp. 915–931, 2011.
- S. Morozov, V. Gombolevskiy, A. Elizarov, M. Gusev, V. Novik, S. Prokudaylo, A. Bardin, E. Popov, N. Ledikhova, V. Chernina et al., “A simplified cluster model and a tool adapted for collaborative labeling of lung cancer ct scans,” Computer Methods and Programs in Biomedicine, vol. 206, p. 106111, 2021.
- E. B. Tsai, S. Simpson, M. P. Lungren, M. Hershman, L. Roshkovan, E. Colak, B. J. Erickson, G. Shih, A. Stein, J. Kalpathy-Cramer et al., “The rsna international covid-19 open radiology database (ricord),” Radiology, vol. 299, no. 1, pp. E204–E213, 2021.
- P. Bilic, P. F. Christ, E. Vorontsov, G. Chlebus, H. Chen, Q. Dou, C.-W. Fu, X. Han, P.-A. Heng, J. Hesser et al., “The liver tumor segmentation benchmark (lits),” arXiv preprint arXiv:1901.04056, 2019.
- M. Hssayeni, M. Croock, A. Salman, H. Al-khafaji, Z. Yahya, and B. Ghoraani, “Computed tomography images for intracranial hemorrhage detection and segmentation,” Intracranial Hemorrhage Segmentation Using A Deep Convolutional Model. Data, vol. 5, no. 1, 2020.
- J. Shapey, A. Kujawa, R. Dorent, G. Wang, A. Dimitriadis, D. Grishchuk, I. Paddick, N. Kitchen, R. Bradford, S. R. Saeed et al., “Segmentation of vestibular schwannoma from mri, an open annotated dataset and baseline algorithm,” Scientific Data, vol. 8, no. 1, pp. 1–6, 2021.
- R. Dorent, A. Kujawa, S. Cornelissen, P. Langenhuizen, J. Shapey, and T. Vercauteren, “Cross-Modality Domain Adaptation Challenge 2022 (crossMoDA),” May 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6504722
- S. R. van der Voort, F. Incekara, M. M. Wijnenga, G. Kapsas, R. Gahrmann, J. W. Schouten, H. J. Dubbink, A. J. Vincent, M. J. van den Bent, P. J. French et al., “The erasmus glioma database (egd): Structural mri scans, who 2016 subtypes, and segmentations of 774 patients with glioma,” Data in brief, vol. 37, p. 107191, 2021.
- R. Souza, O. Lucena, J. Garrafa, D. Gobbi, M. Saluzzi, S. Appenzeller, L. Rittner, R. Frayne, and R. Lotufo, “An open, multi-vendor, multi-field-strength brain mr dataset and analysis of publicly available skull stripping methods agreement,” NeuroImage, vol. 170, pp. 482–494, 2018.
- F. Pérez-García, R. Sparks, and S. Ourselin, “Torchio: a python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning,” Computer Methods and Programs in Biomedicine, p. 106236, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169260721003102
- J. Yang, P. Wang, D. Zou, Z. Zhou, K. Ding, W. Peng, H. Wang, G. Chen, B. Li, Y. Sun et al., “Openood: Benchmarking generalized out-of-distribution detection,” arXiv preprint arXiv:2210.07242, 2022.
- A. Jungo and M. Reyes, “Assessing reliability and challenges of uncertainty estimations for medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 48–56.
- D. Zimmerer, P. M. Full, F. Isensee, P. Jäger, T. Adler, J. Petersen, G. Köhler, T. Ross, A. Reinke, A. Kascenas et al., “Mood 2020: A public benchmark for out-of-distribution detection and localization on medical images,” IEEE Transactions on Medical Imaging, vol. 41, no. 10, pp. 2728–2738, 2022.
- A. Mehrtash, W. M. Wells, C. M. Tempany, P. Abolmaesumi, and T. Kapur, “Confidence calibration and predictive uncertainty estimation for deep medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 3868–3878, dec 2020. [Online]. Available: https://doi.org/10.1109%2Ftmi.2020.3006437
- L. Smith and Y. Gal, “Understanding measures of uncertainty for adversarial example detection,” 2018.
- B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” Advances in neural information processing systems, vol. 30, 2017.
- Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning. PMLR, 2016, pp. 1050–1059.
- Y.-C. Hsu, Y. Shen, H. Jin, and Z. Kira, “Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 951–10 960.
- Y. Liang, J. Zhang, S. Zhao, R. Wu, Y. Liu, and S. Pan, “Omni-frequency channel-selection representations for unsupervised anomaly detection,” arXiv preprint arXiv:2203.00259, 2022.
- F. Meissen, B. Wiestler, G. Kaissis, and D. Rueckert, “On the pitfalls of using the residual error as anomaly score,” arXiv preprint arXiv:2202.03826, 2022.
- J. Cho, I. Kang, and J. Park, “Self-supervised 3d out-of-distribution detection via pseudoanomaly generation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2021, pp. 95–103.
- I. Zakazov, B. Shirokikh, A. Chernyavskiy, and M. Belyaev, “Anatomy of domain shift impact on u-net layers in mri segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021, pp. 211–220.
- K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” Advances in neural information processing systems, vol. 31, 2018.
- F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein, “No new-net,” in International MICCAI Brainlesion Workshop. Springer, 2018, pp. 234–244.
- N. Abraham and N. M. Khan, “A novel focal tversky loss function with improved attention u-net for lesion segmentation,” in 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019). IEEE, 2019, pp. 683–687.