Pulmonologists-Level lung cancer detection based on standard blood test results and smoking status using an explainable machine learning approach (2402.09596v1)
Abstract: Lung cancer (LC) remains the primary cause of cancer-related mortality, largely due to late-stage diagnoses. Effective strategies for early detection are therefore of paramount importance. In recent years, ML has demonstrated considerable potential in healthcare by facilitating the detection of various diseases. In this retrospective development and validation study, we developed an ML model based on dynamic ensemble selection (DES) for LC detection. The model leverages standard blood sample analysis and smoking history data from a large population at risk in Denmark. The study includes all patients examined on suspicion of LC in the Region of Southern Denmark from 2009 to 2018. We validated and compared the predictions by the DES model with diagnoses provided by five pulmonologists. Among the 38,944 patients, 9,940 had complete data of which 2,505 (25\%) had LC. The DES model achieved an area under the roc curve of 0.77$\pm$0.01, sensitivity of 76.2\%$\pm$2.4\%, specificity of 63.8\%$\pm$2.3\%, positive predictive value of 41.6\%$\pm$1.2\%, and F\textsubscript{1}-score of 53.8\%$\pm$1.1\%. The DES model outperformed all five pulmonologists, achieving a sensitivity 9\% higher than their average. The model identified smoking status, age, total calcium levels, neutrophil count, and lactate dehydrogenase as the most important factors for the detection of LC. The results highlight the successful application of the ML approach in detecting LC, surpassing pulmonologists' performance. Incorporating clinical and laboratory data in future risk assessment models can improve decision-making and facilitate timely referrals.
- Sharma, R. Mapping of global, regional and national incidence, mortality and mortality-to-incidence ratio of lung cancer in 2020 and 2050. \JournalTitleInternational Journal of Clinical Oncology 27, 665–675 (2022).
- Sung, H. et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. \JournalTitleCA: a cancer journal for clinicians 71, 209–249 (2021).
- Mortality and survival of lung cancer in denmark: results from the danish lung cancer group 2000–2012. \JournalTitleActa Oncologica 55, 2–9 (2016).
- The Danish Health Authority. Cancer survival. https://www.esundhed.dk/Emner/Kraeft/Kraeftoverlevelse (2021). Accessed 2nd of February 2024.
- Danish Lung Cancer Group. Annual report 2021. https://www.lungecancer.dk/rapporter/aarsrapporter (2021). Accessed 2nd of February 2024.
- Smith, R. A. et al. Cancer screening in the united states, 2019: A review of current american cancer society guidelines and current issues in cancer screening. \JournalTitleCA: a cancer journal for clinicians 69, 184–210 (2019).
- Aberle, D. et al. Reduced lung-cancer mortality with low-dose computed tomographic screening new england journal of medicine 365 (5): 395-409 doi 10.1056. \JournalTitleNEJMoa1102873 (2011).
- Dawson, Q. Nelson trial: Reduced lung-cancer mortality with volume ct screening. \JournalTitleThe Lancet Respiratory Medicine 8, 236 (2020).
- Contemporary issues in the implementation of lung cancer screening. \JournalTitleEuropean Respiratory Review 30 (2021).
- Liu, B. et al. Evolving the pulmonary nodules diagnosis from classical approaches to deep learning-aided decision support: three decades’ development course and future prospect. \JournalTitleJournal of cancer research and clinical oncology 146, 153–185 (2020).
- de Koning, H. J. et al. Reduced lung-cancer mortality with volume ct screening in a randomized trial. \JournalTitleNew England journal of medicine 382, 503–513 (2020).
- The liquid biopsy for lung cancer: state of the art, limitations and future developments. \JournalTitleCancers 13, 3923 (2021).
- Machine learning for early lung cancer identification using routine clinical and laboratory data. \JournalTitleAmerican Journal of Respiratory and Critical Care Medicine 204, 445–453 (2021).
- Wang, X. et al. Prediction of the 1-year risk of incident lung cancer: prospective study using electronic health records from the state of maine. \JournalTitleJournal of medical Internet research 21, e13260 (2019).
- Henriksen, M. B. et al. A collection of multiregistry data on patients at high risk of lung cancer—a danish retrospective cohort study of nearly 40,000 patients. \JournalTitleTranslational Lung Cancer Research 12, 2392 (2023).
- Krist, A. H. et al. Screening for lung cancer: Us preventive services task force recommendation statement. \JournalTitleJama 325, 962–970 (2021).
- Tammemaegi, M. C. et al. Evaluation of the lung cancer risks at which to screen ever-and never-smokers: screening rules applied to the plco and nlst cohorts. \JournalTitlePLoS medicine 11, e1001764 (2014).
- Rubin, K. H. et al. Developing and validating a lung cancer risk prediction model: A nationwide population-based study. \JournalTitleCancers 15, 487 (2023).
- Robbins, H. A. et al. Comparative performance of lung cancer risk models to define lung screening eligibility in the united kingdom. \JournalTitleBritish Journal of Cancer 124, 2026–2034 (2021).
- The Danish Health Authority. Classifications. https://sundhedsdatastyrelsen.dk/da/english/health_data_and_registers/classifications (2021). Accessed 2nd of February 2024.
- Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. \JournalTitleJournal of machine learning research 18, 1–5 (2017).
- Nonparametric statistical methods (John Wiley & Sons, 2013).
- Demšar, J. Statistical comparisons of classifiers over multiple data sets. \JournalTitleThe Journal of Machine learning research 7, 1–30 (2006).
- A simple, step-by-step guide to interpreting decision curve analysis. \JournalTitleDiagnostic and prognostic research 3, 1–8 (2019).