Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data (2402.14022v1)
Abstract: This article describes the clinical validation study setup, statistical analysis and results for a deep learning algorithm which detects dental anomalies in intraoral radiographic images, more specifically caries, apical lesions, root canal treatment defects, marginal defects at crown restorations, periodontal bone loss and calculus. The study compares the detection performance of dentists using the deep learning algorithm to the prior performance of these dentists evaluating the images without algorithmic assistance. Calculating the marginal profit and loss of performance from the annotated paired image data allows for a quantification of the hypothesized change in sensitivity and specificity. The statistical significance of these results is extensively proven using both McNemar's test and the binomial hypothesis test. The average sensitivity increases from $60.7\%$ to $85.9\%$, while the average specificity slightly decreases from $94.5\%$ to $92.7\%$. We prove that the increase of the area under the localization ROC curve (AUC) is significant (from $0.60$ to $0.86$ on average), while the average AUC is bounded by the $95\%$ confidence intervals ${[}0.54, 0.65{]}$ and ${[}0.82, 0.90{]}$. When using the deep learning algorithm for diagnostic guidance, the dentist can be $95\%$ confident that the average true population sensitivity is bounded by the range $79.6\%$ to $91.9\%$. The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.
- D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4):387–415, 1975.
- Detecting caries lesions of different radiographic extension on bitewings using deep learning. Journal of Dentistry, 100:103425, 2020.
- The role of deep learning for periapical lesion detection on panoramic radiographs. Dentomaxillofacial Radiology, 52(8):20230118, 2023.
- Receiver Operating Characteristic rating analysis: Generalization to the population of readers and patients with the jackknife method. Investigative Radiology, 27(9):723–731, 1992.
- A. L. Edwards. Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika, 13(3):185–187, 1948.
- Deep learning for the radiographic detection of apical lesions. Journal of Endodontics, 45(7):917–922, 2019.
- T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27:861–874, 2006.
- Biostatistics: A guide to design, analysis and discovery. Elsevier, 2nd edition, 2007.
- D. C. Gakenheimer. The efficacy of a computerized caries detector in intraoral digital radiography. Journal of the American Dental Association, 133(7):883–890, 2002.
- K. Hajian-Tilaki. Sample size estimation in diagnostic test studies of biomedical informatics. Journal of Biomedical Informatics, 48:193–204, 2014.
- The effect of deep-learning tool on dentists’ performances in detecting apical radiolucencies on periapical radiographs. Dentomaxillofacial Radiology, 51(7):20220122, 2022.
- The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology, 143:29–36, 1982.
- A method of comparing the areas under Receiver Operating Characteristic curves derived from the same cases. Radiology, 148:839–843, 1983.
- N. E. D. Hawass. Comparing the sensitivities and specificities of two diagnostic procedures performed on the same group of patients. British Journal of Radiology, 70:360–366, 1997.
- X. He and E. Frey. ROC, LROC, FROC, AFROC: An alphabet soup. Journal of the American College of Radiology, 6(9):652–655, 2009.
- A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data. Statistics in Medicine, 24:1579–1607, 2005.
- Automated feature detection in dental periapical radiographs by using deep learning. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology, 131(6):711–720, 2021.
- Deep learning for the radiographic detection of periodontal bone loss. Scientific Reports, 9:8495, 2019.
- Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. Journal of Dentistry, 77:106–111, 2018.
- Deep learning for early dental caries detection in bitewing radiographs. Scientific Reports, 11:16807, 2021.
- On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1):50–60, 1947.
- Q. McNemar. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153–157, 1947.
- C. E. Metz. Some practical issues of experimental design and data analysis in radiological ROC studies. Investigative Radiology, 24(3):234–245, 1989.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, 2015.
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, pages 1–14, 2015.
- F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80–83, 1945.
- Pieter Van Leemput (1 paper)
- Johannes Keustermans (1 paper)
- Wouter Mollemans (1 paper)