A Prospective Randomized Clinical Trial for Measuring Radiology Study Reporting Time on Artificial Intelligence-Based Detection of Intracranial Hemorrhage in Emergent Care Head CT (2002.12515v1)
Abstract: We propose Artificial Intelligence Prospective Randomized Observer Blinding Evaluation (AI-PROBE) for quantitative clinical performance evaluation of radiology AI systems within prospective randomized clinical trials. AI-PROBE encompasses a study design and a matching radiology IT infrastructure that randomly blinds radiologists for results provided by AI-based image analysis. To demonstrate the applicability of our evaluation framework, we present a first prospective randomized clinical trial on the effect of Intra-Cranial Hemorrhage (ICH) detection in emergent care head CT on radiology study Turn-Around Time (TAT). Here, we acquired 620 non-contrast head CT scans from inpatient and emergency room patients at a large academic hospital. Following acquisition, scans were automatically analyzed for the presence of ICH using commercially available software (Aidoc, Tel Aviv, Israel). Cases identified positive for ICH by AI (ICH-AI+) were flagged in radiologists' reading worklists, where flagging was randomly switched off with probability 50%. TAT was measured as time difference between study completion and first clinically communicated reporting, with time stamps automatically retrieved from various IT systems. TATs for flagged cases (73+/-143 min) were significantly lower than TATs for non-flagged (132+/-193 min) cases (p<0.05, one-sided t-test), where 105 of 122 ICH-AI+ cases were true positive. Total sensitivity, specificity, and accuracy over all analyzed cases were 95.0%, 96.7%, and 96.4%, respectively. We conclude that automatic identification of ICH reduces TAT for ICH in emergent care head CT, which carries the potential for improving timely clinical management of ICH. Our results suggest that AI-PROBE can contribute to systematic quantitative evaluation of AI systems in clinical practice using clinically meaningful quantities, such as TAT or diagnostic accuracy.