Guidelines and Evaluation of Clinical Explainable AI in Medical Image Analysis
(2202.10553v3)
Published 16 Feb 2022 in cs.LG, cs.AI, cs.CV, and eess.IV
Abstract: Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. Applying XAI in clinical settings requires proper evaluation criteria to ensure the explanation technique is both technically sound and clinically useful, but specific support is lacking to achieve this goal. To bridge the research gap, we propose the Clinical XAI Guidelines that consist of five criteria a clinical XAI needs to be optimized for. The guidelines recommend choosing an explanation form based on Guideline 1 (G1) Understandability and G2 Clinical relevance. For the chosen explanation form, its specific XAI technique should be optimized for G3 Truthfulness, G4 Informative plausibility, and G5 Computational efficiency. Following the guidelines, we conducted a systematic evaluation on a novel problem of multi-modal medical image explanation with two clinical tasks, and proposed new evaluation metrics accordingly. Sixteen commonly-used heatmap XAI techniques were evaluated and found to be insufficient for clinical use due to their failure in G3 and G4. Our evaluation demonstrated the use of Clinical XAI Guidelines to support the design and evaluation of clinically viable XAI.
The paper introduces the Clinical Explainable AI Guidelines, a set of five criteria for evaluating XAI techniques in medical image analysis based on clinical utility.
Evaluation of 16 common heatmap XAI techniques against these guidelines shows that they often lack the truthfulness and informative plausibility required for clinical application.
The study introduces Modality-Specific Feature Importance (MSFI), a novel metric to quantify the clinical plausibility of explanations in multi-modal medical imaging.
This paper introduces the Clinical Explainable AI Guidelines, a set of five criteria designed to optimize clinical XAI (Explainable Artificial Intelligence) techniques for medical image analysis. These guidelines aim to ensure that explanation techniques are both technically sound and clinically useful. The authors evaluate 16 commonly-used heatmap XAI techniques against these guidelines, revealing their limitations for clinical application.
The Clinical XAI Guidelines consist of the following five criteria:
G1 Understandability: Explanations should be easily understood by clinical users without requiring technical expertise.
G2 Clinical relevance: Explanations should align with physicians' clinical decision-making processes and support their clinical reasoning.
G3 Truthfulness: Explanations should accurately reflect the AI model's decision-making process.
G4 Informative plausibility: User assessment of explanation plausibility should provide insights into AI decision quality, including potential flaws or biases.
G5 Computational efficiency: Explanations should be generated within a clinically acceptable timeframe.
The authors conducted a systematic evaluation of 16 heatmap techniques, assessing their adherence to the proposed guidelines across two clinical tasks. The evaluation revealed that while existing heatmap methods generally meet G1 and partially meet G2, they often fail to meet G3 and G4, indicating their inadequacy for clinical use.
The paper also addresses the clinically relevant but technically underexplored problem of multi-modal medical image explanation. To facilitate this, the authors introduce a novel metric called modality-specific feature importance (MSFI) to quantify and automate physicians' assessment of explanation plausibility.
The authors evaluate 16 post-hoc XAI algorithms, which are grouped into gradient-based and perturbation-based methods: