A Dempster-Shafer approach to trustworthy AI with application to fetal brain MRI segmentation (2204.02779v4)
Abstract: Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on AI. In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of a state-of-the-art backbone AI for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities.
- F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature methods, vol. 18, no. 2, pp. 203–211, 2021.
- B. Allen, S. Agarwal, L. Coombs, C. Wald, and K. Dreyer, “2020 ACR data science institute artificial intelligence survey,” Journal of the American College of Radiology, vol. 18, no. 8, pp. 1153–1159, 2021.
- F. Cabitza, “Biases affecting human decision making in AI-supported second opinion settings,” in International Conference on Modeling Decisions for Artificial Intelligence. Springer, 2019, pp. 283–294.
- L. Fidon, M. Aertsen, N. Mufti, T. Deprest, D. Emam, F. Guffens, E. Schwartz, M. Ebner, D. Prayer, G. Kasprian et al., “Distributionally robust segmentation of abnormal fetal brain 3D MRI,” in Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis. Springer, 2021, pp. 263–273.
- C. Gonzalez, K. Gotkowski, A. Bucher, R. Fischbach, I. Kaltenborn, and A. Mukhopadhyay, “Detecting when pre-trained nnU-Net models fail silently for Covid-19 lung lesion segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2021, pp. 304–314.
- K. G. van Leeuwen, S. Schalekamp, M. J. Rutten, B. van Ginneken, and M. de Rooij, “Artificial intelligence in radiology: 100 commercially available products and their scientific evidence,” European radiology, vol. 31, no. 6, pp. 3797–3804, 2021.
- European Commission, “Ethics guidelines for trustworthy AI,” European Commission, Report, 2019.
- ——, “Artificial intelligence act,” European Commission, Report, 2021.
- M. J. Cardoso, M. Modat, R. Wolz, A. Melbourne, D. Cash, D. Rueckert, and S. Ourselin, “Geodesic information flows: spatially-variant graphs and their application to segmentation and fusion,” IEEE transactions on medical imaging, vol. 34, no. 9, pp. 1976–1988, 2015.
- M. Niyazi, M. Brada, A. J. Chalmers, S. E. Combs, S. C. Erridge, A. Fiorentino, A. L. Grosu, F. J. Lagerwaard, G. Minniti, R.-O. Mirimanoff et al., “Estro-acrop guideline “target delineation of glioblastomas”,” Radiotherapy and oncology, vol. 118, no. 1, pp. 35–42, 2016.
- S. K. Warfield, K. H. Zou, and W. M. Wells, “Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation,” IEEE transactions on medical imaging, vol. 23, no. 7, pp. 903–921, 2004.
- P. Welinder, S. Branson, P. Perona, and S. Belongie, “The multidimensional wisdom of crowds,” Advances in neural information processing systems, vol. 23, 2010.
- I. Bloch, “Some aspects of dempster-shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account,” Pattern Recognition Letters, vol. 17, no. 8, pp. 905–919, 1996.
- A.-S. Capelle, O. Colot, and C. Fernandez-Maloigne, “Evidential segmentation scheme of multi-echo MR images for the detection of brain tumors using neighborhood information,” Information Fusion, vol. 5, no. 3, pp. 203–216, 2004.
- J. Ghasemi, R. Ghaderi, M. K. Mollaei, and S. Hojjatoleslami, “A novel fuzzy dempster–shafer inference system for brain MRI segmentation,” Information Sciences, vol. 223, pp. 205–220, 2013.
- B. Lelandais, I. Gardin, L. Mouchard, P. Vera, and S. Ruan, “Dealing with uncertainty and imprecision in image segmentation using belief function theory,” International Journal of Approximate Reasoning, vol. 55, no. 1, pp. 376–387, 2014.
- J. Liu, X. Lu, Y. Li, X. Chen, and Y. Deng, “A new method based on dempster–shafer theory and fuzzy c-means for brain MRI segmentation,” Measurement Science and Technology, vol. 26, no. 10, p. 105402, 2015.
- K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- L. Zhang, X. Wang, D. Yang, T. Sanford, S. Harmon, B. Turkbey, B. J. Wood, H. Roth, A. Myronenko, D. Xu et al., “Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation,” IEEE transactions on medical imaging, vol. 39, no. 7, pp. 2531–2540, 2020.
- Y. Tang, D. Yang, W. Li, H. R. Roth, B. Landman, D. Xu, V. Nath, and A. Hatamizadeh, “Self-supervised pre-training of swin transformers for 3d medical image analysis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 730–20 740.
- L. Fidon, M. Aertsen, T. Deprest, D. Emam, F. Guffens, N. Mufti, E. Van Elslander, E. Schwartz, M. Ebner, D. Prayer, G. Kasprian, A. L. David, A. Melbourne, S. Ourselin, J. Deprest, G. Langs, and T. Vercauteren, “Distributionally robust deep learning using hardness weighted sampling,” Machine Learning for Biomedical Imaging, vol. 1, 2022.
- K. Kushibar, S. Valverde, S. Gonzalez-Villa, J. Bernal, M. Cabezas, A. Oliver, and X. Lladó, “Automated sub-cortical brain structure segmentation combining spatial and deep convolutional features,” Medical image analysis, vol. 48, pp. 177–186, 2018.
- Q. Liu, C. Chen, Q. Dou, and P.-A. Heng, “Single-domain generalization in medical image segmentation via test-time adaptation from shape dictionary,” 2022.
- A. Jacovi, A. Marasović, T. Miller, and Y. Goldberg, “Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021, pp. 624–635.
- R. R. Hoffman, “A taxonomy of emergent trusting in the human–machine relationship,” Cognitive Systems Engineering, pp. 137–164, 2017.
- M. Modat, G. R. Ridgway, Z. A. Taylor, M. Lehmann, J. Barnes, D. J. Hawkes, N. C. Fox, and S. Ourselin, “Fast free-form deformation using graphics processing units,” Computer methods and programs in biomedicine, vol. 98, no. 3, pp. 278–284, 2010.
- A. Gholipour, C. K. Rollins, C. Velasco-Annis, A. Ouaalam, A. Akhondi-Asl, O. Afacan, C. M. Ortinau, S. Clancy, C. Limperopoulos, E. Yang et al., “A normative spatiotemporal MRI atlas of the fetal brain for automatic segmentation and analysis of early brain growth,” Scientific reports, vol. 7, no. 1, pp. 1–13, 2017.
- J. Wu, T. Sun, B. Yu, Z. Li, Q. Wu, Y. Wang, Z. Qian, Y. Zhang, L. Jiang, and H. Wei, “Age-specific structural fetal brain atlases construction and cortical development quantification for chinese population,” NeuroImage, p. 118412, 2021.
- L. Fidon, E. Viola, N. Mufti, A. David, A. Melbourne, P. Demaerel, S. Ourselin, T. Vercauteren, J. Deprest, and M. Aertsen, “A spatio-temporal atlas of the developing fetal brain with spina bifida aperta,” Open Research Europe, vol. 1, no. 123, 2021.
- K. Payette, P. de Dumast, H. Kebiri, I. Ezhov, J. C. Paetzold, S. Shit, A. Iqbal, R. Khan, R. Kottke, P. Grehten et al., “An automatic multi-tissue human fetal brain segmentation benchmark using the fetal tissue annotation dataset,” Scientific Data, vol. 8, no. 1, pp. 1–14, 2021.
- A. E. Fetit, A. Alansary, L. Cordero-Grande, J. Cupitt, A. B. Davidson, A. D. Edwards, J. V. Hajnal, E. Hughes, K. Kamnitsas, V. Kyriakopoulou et al., “A deep learning approach to segmentation of the developing cortex in fetal brain MRI with minimal manual labeling,” in Medical Imaging with Deep Learning. PMLR, 2020, pp. 241–261.
- L. Fidon, M. Aertsen, D. Emam, N. Mufti, F. Guffens, T. Deprest, P. Demaerel, A. L. David, A. Melbourne, S. Ourselin et al., “Label-set loss functions for partial supervision: application to fetal brain 3D MRI parcellation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2021, pp. 647–657.
- L. Fidon, M. Aertsen, S. Shit, P. Demaerel, S. Ourselin, J. Deprest, and T. Vercauteren, “Partial supervision for the feta challenge 2021,” arXiv preprint arXiv:2111.02408, 2021.
- J. Hong, H. J. Yun, G. Park, S. Kim, C. T. Laurentys, L. C. Siqueira, T. Tarui, C. K. Rollins, C. M. Ortinau, P. E. Grant et al., “Fetal cortical plate segmentation using fully convolutional networks with multiple plane aggregation,” Frontiers in neuroscience, p. 1226, 2020.
- N. Khalili, N. Lessmann, E. Turk, N. Claessens, R. de Heus, T. Kolk, M. Viergever, M. Benders, and I. Išgum, “Automatic brain tissue segmentation in fetal MRI using convolutional neural networks,” Magnetic resonance imaging, vol. 64, pp. 77–89, 2019.
- L. Li, M. Sinclair, A. Makropoulos, J. V. Hajnal, A. David Edwards, B. Kainz, D. Rueckert, and A. Alansary, “CAS-Net: Conditional atlas generation and brain segmentation for fetal MRI,” in Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis. Springer, 2021, pp. 221–230.
- L. Zhao, J. Asis-Cruz, X. Feng, Y. Wu, K. Kapse, A. Largent, J. Quistorff, C. Lopez, D. Wu, K. Qing et al., “Automated 3D fetal brain segmentation using an optimized deep learning approach,” American Journal of Neuroradiology, 2022.
- A. Makropoulos, S. J. Counsell, and D. Rueckert, “A review on automatic fetal and neonatal brain MRI segmentation,” NeuroImage, vol. 170, pp. 231–248, 2018.
- D. Alis, M. Yergin, C. Alis, C. Topel, O. Asmakutlu, O. Bagcilar, Y. D. Senli, A. Ustundag, V. Salt, S. N. Dogan et al., “Inter-vendor performance of deep learning in segmenting acute ischemic lesions on diffusion-weighted imaging: a multicenter study,” Scientific reports, vol. 11, no. 1, pp. 1–10, 2021.
- R. A. Kamraoui, V.-T. Ta, T. Tourdias, B. Mansencal, J. V. Manjon, and P. Coupé, “Deeplesionbrain: Towards a broader deep-learning generalization for multiple sclerosis lesion segmentation,” Medical Image Analysis, vol. 76, p. 102312, 2022.
- G. Mårtensson, D. Ferreira, T. Granberg, L. Cavallin, K. Oppedal, A. Padovani, I. Rektorova, L. Bonanni, M. Pardini, M. G. Kramberger et al., “The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study,” Medical Image Analysis, vol. 66, p. 101714, 2020.
- L. Oakden-Rayner, J. Dunnmon, G. Carneiro, and C. Ré, “Hidden stratification causes clinically meaningful failures in machine learning for medical imaging,” in Proceedings of the ACM conference on health, inference, and learning, 2020, pp. 151–159.
- C. S. Perone, P. Ballester, R. C. Barros, and J. Cohen-Adad, “Unsupervised domain adaptation for medical imaging segmentation with self-ensembling,” NeuroImage, vol. 194, pp. 1–11, 2019.
- J. Pollenus, L. Lagae, M. Aertsen, and K. Jansen, “The impact of cerebral anomalies on cognitive outcome in patients with spina bifida: A systematic review,” European Journal of Paediatric Neurology, 2020.
- T. P. Naidich, R. M. Pudlowski, J. Naidich, M. Gornish, and F. Rodriguez, “Computed tomographic signs of the chiari II malformation. part I: Skull and dural partitions.” Radiology, vol. 134, no. 1, pp. 65–71, 1980.
- F. Kofler, I. Ezhov, F. Isensee, F. Balsiger, C. Berger, M. Koerner, J. Paetzold, H. Li, S. Shit, R. McKinley et al., “Are we using appropriate segmentation metrics? identifying correlates of human expert perception for CNN training beyond rolling the DICE coefficient,” arXiv preprint arXiv:2103.06205, 2021.
- M. Aertsen, J. Verduyckt, F. De Keyzer, T. Vercauteren, F. Van Calenbergh, L. De Catte, S. Dymarkowski, P. Demaerel, and J. Deprest, “Reliability of MR imaging–based posterior fossa and brain stem measurements in open spinal dysraphism in the era of fetal surgery,” American Journal of Neuroradiology, vol. 40, no. 1, pp. 191–198, 2019.
- E. Danzer, M. P. Johnson, M. Bebbington, E. M. Simon, R. D. Wilson, L. T. Bilaniuk, L. N. Sutton, and N. S. Adzick, “Fetal head biometry assessed by fetal magnetic resonance imaging following in utero myelomeningocele repair,” Fetal diagnosis and therapy, vol. 22, no. 1, pp. 1–6, 2007.
- N. Rieke, J. Hancox, W. Li, F. Milletari, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein et al., “The future of digital health with federated learning,” NPJ digital medicine, vol. 3, no. 1, pp. 1–7, 2020.
- European Union, “The European Union Medical Device Regulation – Regulation (EU) 2017/745 (EU MDR),” European Union, Regulation, 2017.
- F. Kofler, I. Ezhov, L. Fidon, C. M. Pirkl, J. C. Paetzold, E. Burian, S. Pati, M. El Husseini, F. Navarro, S. Shit et al., “Robust, primitive, and unsupervised quality estimation for segmentation ensembles,” Frontiers in Neuroscience, vol. 15, 2021.
- R. Robinson, V. V. Valindria, W. Bai, O. Oktay, B. Kainz, H. Suzuki, M. M. Sanghvi, N. Aung, J. M. Paiva, F. Zemrak et al., “Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study,” Journal of Cardiovascular Magnetic Resonance, vol. 21, no. 1, pp. 1–14, 2019.
- B. J. Dietvorst, J. P. Simmons, and C. Massey, “Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them,” Management Science, vol. 64, no. 3, pp. 1155–1170, 2018.