Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data (2402.14022v1)

Published 1 Feb 2024 in eess.IV, cs.CV, cs.LG, and stat.AP

Abstract: This article describes the clinical validation study setup, statistical analysis and results for a deep learning algorithm which detects dental anomalies in intraoral radiographic images, more specifically caries, apical lesions, root canal treatment defects, marginal defects at crown restorations, periodontal bone loss and calculus. The study compares the detection performance of dentists using the deep learning algorithm to the prior performance of these dentists evaluating the images without algorithmic assistance. Calculating the marginal profit and loss of performance from the annotated paired image data allows for a quantification of the hypothesized change in sensitivity and specificity. The statistical significance of these results is extensively proven using both McNemar's test and the binomial hypothesis test. The average sensitivity increases from $60.7\%$ to $85.9\%$, while the average specificity slightly decreases from $94.5\%$ to $92.7\%$. We prove that the increase of the area under the localization ROC curve (AUC) is significant (from $0.60$ to $0.86$ on average), while the average AUC is bounded by the $95\%$ confidence intervals ${[}0.54, 0.65{]}$ and ${[}0.82, 0.90{]}$. When using the deep learning algorithm for diagnostic guidance, the dentist can be $95\%$ confident that the average true population sensitivity is bounded by the range $79.6\%$ to $91.9\%$. The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4):387–415, 1975.
  2. Detecting caries lesions of different radiographic extension on bitewings using deep learning. Journal of Dentistry, 100:103425, 2020.
  3. The role of deep learning for periapical lesion detection on panoramic radiographs. Dentomaxillofacial Radiology, 52(8):20230118, 2023.
  4. Receiver Operating Characteristic rating analysis: Generalization to the population of readers and patients with the jackknife method. Investigative Radiology, 27(9):723–731, 1992.
  5. A. L. Edwards. Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika, 13(3):185–187, 1948.
  6. Deep learning for the radiographic detection of apical lesions. Journal of Endodontics, 45(7):917–922, 2019.
  7. T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27:861–874, 2006.
  8. Biostatistics: A guide to design, analysis and discovery. Elsevier, 2nd edition, 2007.
  9. D. C. Gakenheimer. The efficacy of a computerized caries detector in intraoral digital radiography. Journal of the American Dental Association, 133(7):883–890, 2002.
  10. K. Hajian-Tilaki. Sample size estimation in diagnostic test studies of biomedical informatics. Journal of Biomedical Informatics, 48:193–204, 2014.
  11. The effect of deep-learning tool on dentists’ performances in detecting apical radiolucencies on periapical radiographs. Dentomaxillofacial Radiology, 51(7):20220122, 2022.
  12. The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology, 143:29–36, 1982.
  13. A method of comparing the areas under Receiver Operating Characteristic curves derived from the same cases. Radiology, 148:839–843, 1983.
  14. N. E. D. Hawass. Comparing the sensitivities and specificities of two diagnostic procedures performed on the same group of patients. British Journal of Radiology, 70:360–366, 1997.
  15. X. He and E. Frey. ROC, LROC, FROC, AFROC: An alphabet soup. Journal of the American College of Radiology, 6(9):652–655, 2009.
  16. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data. Statistics in Medicine, 24:1579–1607, 2005.
  17. Automated feature detection in dental periapical radiographs by using deep learning. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology, 131(6):711–720, 2021.
  18. Deep learning for the radiographic detection of periodontal bone loss. Scientific Reports, 9:8495, 2019.
  19. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. Journal of Dentistry, 77:106–111, 2018.
  20. Deep learning for early dental caries detection in bitewing radiographs. Scientific Reports, 11:16807, 2021.
  21. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1):50–60, 1947.
  22. Q. McNemar. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153–157, 1947.
  23. C. E. Metz. Some practical issues of experimental design and data analysis in radiological ROC studies. Investigative Radiology, 24(3):234–245, 1989.
  24. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, 2015.
  25. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, pages 1–14, 2015.
  26. F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80–83, 1945.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)

Summary

An Evaluation of AI-Based Dental Anomaly Detection in Intraoral Radiographs

The paper "Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data" by Van Leemput et al. provides a comprehensive analysis of a deep learning algorithm designed for the detection of various dental anomalies in intraoral radiographic images (IORs). The primary anomalies under consideration include caries, apical lesions, root canal treatment defects, marginal defects at crown restorations, periodontal bone loss, and calculus. The effectiveness of the AI algorithm is measured by comparing the diagnostic performance of dentists with and without algorithmic assistance.

Methodology

The authors employed a paired data approach by setting up a clinical validation paper involving a series of intraoral images that were analyzed by dentists in two separate modalities. The first modality, referred to as the control arm, required dentists to evaluate images without any AI assistance. In the second modality, known as the paper arm, the same images were reviewed with the support of AI-generated annotations. A notable aspect of the methodology is the latency period to minimize recall bias, ensuring a robust comparison. Statistical significance was substantiated using both McNemar's test and binomial hypothesis tests, with a focus on changes in detection sensitivity and specificity.

Results

The paper presents significant numerical outcomes demonstrating that the AI algorithm improves diagnostic sensitivity. Specifically, the paper reports an increase in average sensitivity from 60.7% to 85.9% when the dentists utilized AI, although this was accompanied by a minor reduction in specificity, from 94.5% to 92.7%. The significance of these results is buttressed by robust statistical analyses, including confidence intervals and hypothesis testing, reinforcing the reliability of the findings.

The research further explores the implications by detailing the increase in the area under the receiver operating characteristic (ROC) curve, from 0.60 to 0.86, which is substantiated as statistically significant. The confidence intervals for sensitivity and specificity, constructed using traditional binomial distribution methods, provide additional insight into the generalizability of the results beyond this specific paper.

Implications and Future Directions

The implications of these findings are profound, both practically and theoretically. The demonstrated efficacy of AI assistance in enhancing diagnostic accuracy holds substantial potential for clinical practice, particularly in improving patient outcomes through more precise anomaly detection. Statistically validating the added benefit of AI tools can accelerate their adoption in dental diagnostics and perhaps influence similar strategies across other areas of medical imaging and diagnostics.

The theoretical underpinnings of using a paired data approach as applied in this research could serve as a methodological benchmark or blueprint for future studies exploring the impact of technological interventions in clinical practices. The clear enhancements in sensitivity paired with stable specificity suggest that this approach could be extended to other anomaly detection tasks where AI models exhibit strengths in localizing and categorizing complex patterns within imagery.

Looking forward, this research can pave the way for further exploration into refining AI tools for dental care, potentially broadening the scope to include predictive modeling or real-time diagnostics. Additionally, it affirms the potential for integrating large-scale AI systems into daily dental practice, provided there is emphasis on thorough clinical validation and continuous improvement of these technologies.

This detailed validation methodology and its subsequent results advocate for continued investment in AI-based solutions to complement and enhance the diagnostic capabilities of dental practitioners. As AI continues to evolve, such studies will be foundational in ensuring these systems are not only technologically sophisticated but also clinically relevant and beneficial.

Youtube Logo Streamline Icon: https://streamlinehq.com