Optimizing Mortality Prediction for ICU Heart Failure Patients: Leveraging XGBoost and Advanced Machine Learning with the MIMIC-III Database (2409.01685v1)
Abstract: Heart failure affects millions of people worldwide, significantly reducing quality of life and leading to high mortality rates. Despite extensive research, the relationship between heart failure and mortality rates among ICU patients is not fully understood, indicating the need for more accurate prediction models. This study analyzed data from 1,177 patients over 18 years old from the MIMIC-III database, identified using ICD-9 codes. Preprocessing steps included handling missing data, removing duplicates, treating skewness, and using oversampling techniques to address data imbalances. Through rigorous feature selection using Variance Inflation Factor (VIF), expert clinical input, and ablation studies, 46 key features were identified to enhance model performance. Our analysis compared several machine learning models, including Logistic Regression, Support Vector Machine (SVM), Random Forest, LightGBM, and XGBoost. XGBoost emerged as the superior model, achieving a test AUC-ROC of 0.9228 (95\% CI 0.8748 - 0.9613), significantly outperforming our previous work (AUC-ROC of 0.8766) and the best results reported in existing literature (AUC-ROC of 0.824). The improved model's success is attributed to advanced feature selection methods, robust preprocessing techniques, and comprehensive hyperparameter optimization through Grid-Search. SHAP analysis and feature importance evaluations based on XGBoost highlighted key variables like leucocyte count and RDW, providing valuable insights into the clinical factors influencing mortality risk. This framework offers significant support for clinicians, enabling them to identify high-risk ICU heart failure patients and improve patient outcomes through timely and informed interventions.
- S. Emmons-Bell, C. Johnson, and G. Roth, “Prevalence, incidence, and survival of heart failure: a systematic review,” Heart, vol. 108, no. 17, pp. heartjnl-2021-320131, 2022. doi: 10.1136/heartjnl-2021-320131.
- J. Li, S. Liu, Y. Hu, L. Zhu, Y. Mao, and J. Liu, “Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study,” Journal of Medical Internet Research, vol. 24, no. 8, p. e38082, Aug. 2022. doi: 10.2196/38082.
- Y. D. Dlugacz, L. Stier, D. Lustbader, M. C. Jacobs, E. Hussain, and A. Greenwood, “Expanding a Performance Improvement Initiative in Critical Care from Hospital to System,” The Joint Commission Journal on Quality Improvement, vol. 28, no. 8, pp. 419–434, Aug. 2002. doi: 10.1016/S1070-3241(02)28042-6.
- J. A. M. Sidey-Gibbons and C. J. Sidey-Gibbons, “Machine learning in medicine: a practical introduction,” BMC Medical Research Methodology, vol. 19, no. 1, p. 64, 2019. doi: 10.1186/s12874-019-0681-4.
- American Heart Association, “Heart failure [Internet],” www.heart.org. Available from: https://www.heart.org/en/health-topics/heart-failure.
- R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, “Deep patient: an unsupervised representation to predict the future of patients from the electronic health records,” Sci Rep, vol. 6, no. 1, p. 26094, 2016. doi: 10.1038/srep26094.
- Y. W. Lin, Y. Zhou, F. A. O. Faghri, M. J. Shaw, and R. H. Campbell, “Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long short-term memory,” PLoS ONE, vol. 14, no. 7, p. e0218942, 2019. doi: 10.1371/journal.pone.0218942.
- A. Brnabic and L. M. Hess, “Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making,” BMC Medical Informatics and Decision Making, vol. 21, no. 1, Feb. 2021. doi: 10.1186/s12911-021-01456-1.
- F. Li, H. Xin, J. Zhang, M. Fu, J. Zhou, and Z. Lian, “Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database,” BMJ Open, vol. 11, no. 7, p. e044779, Jul. 2021. doi: 10.1136/bmjopen-2020-044779.
- M. Kruse, B. Stein, C. Thomas, and S. Kachnowski, “The impact of electronic health records on healthcare quality: a systematic review and meta-analysis,” European Journal of Public Health, vol. 29, no. 5, pp. 1-8, 2019. doi: 10.1093/eurpub/cku015.
- H. Wang, S. W. Park, and J. K. Kim, “Unlocking the power of EHRs: Machine learning for clinical decision support,” Journal of Electrical Systems and Information Technology, vol. 10, 2023, doi: 10.1186/s12938-023-01067-3.
- H. Wang, S. W. Park, and J. K. Kim, “Healthcare predictive analytics using machine learning and deep learning techniques: a survey,” Journal of Electrical Systems and Information Technology, vol. 10, 2023. doi: 10.1186/s12938-023-01067-3.
- J. A. C. Lima, R. N. Rajagopalan, and A. Gupta, “Artificial intelligence and machine learning for cardiovascular disease: Potential applications and perspectives,” Journal of Cardiovascular Computed Tomography, vol. 17, no. 2, 2023. doi: 10.1016/j.jcct.2022.11.006.
- C. C. Chiu, C. M. Wu, T. N. Chien, L. J. Kao, C. Li, and H. L. Jiang, “Applying an Improved Stacking Ensemble Model to Predict the Mortality of ICU Patients with Heart Failure,” Journal of Clinical Medicine, vol. 11, no. 21, p. 6460, Jan. 2022. doi: 10.3390/jcm11216460.
- Z. Chen, T. Li, S. Guo, D. Zeng, and K. Wang, “Machine learning-based in-hospital mortality risk prediction tool for intensive care unit patients with heart failure,” Frontiers in Cardiovascular Medicine, vol. 10, 2023. doi: 10.3389/fcvm.2023.1119699.
- Z. Yu, N. Ashrafi, H. Li, K. Alaei, and M. Pishgar, “Prediction of 30-day mortality for ICU patients with Sepsis-3,” BMC Medical Informatics and Decision Making, vol. 24, no. 1, p. 223, 2024. doi: 10.1186/s12911-024-02332-2.
- A. Johnson, T. Pollard, and R. Mark, “MIMIC-III Clinical Database [Internet],” Physionet.org, 2016. Available from: https://physionet.org/content/mimiciii/1.4/.
- H. Yun, J. Choi, and J. H. Park, “Prediction of critical care outcome for adult patients presenting to emergency department using initial triage information: an XGBoost algorithm analysis,” JMIR Medical Informatics, vol. 9, no. 9, p. e30770, 2021. doi: 10.2196/30770.
- N. Ashrafi, A. Abdollahi, and M. Pishgar, “Enhanced Prediction of Ventilator-Associated Pneumonia in Patients with Traumatic Brain Injury Using Advanced Machine Learning Techniques,” arXiv preprint arXiv:2408.01144, 2024. doi: 10.48550/arXiv.2408.01144.
- A. Amritphale, R. Chatterjee, S. Chatterjee, N. Amritphale, A. Rahnavard, G. M. Awan, and G. C. Fonarow, “Predictors of 30-day unplanned readmission after carotid artery stenting using artificial intelligence,” Advances in Therapy, vol. 38, no. 6, pp. 2954–2972, 2021. doi: 10.1007/s12325-021-01701-7.
- G. S. Collins, D. G. Altman, J. B. Reitsma, K. G. Moons, and the TRIPOD Group, “TRIPOD 2024 statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods,” BMJ, vol. 385, 2024. doi: 10.1136/bmj.q824.
- K. G. Moons, D. G. Altman, J. B. Reitsma, J. P. Ioannidis, P. Macaskill, E. W. Steyerberg, and G. S. Collins, “Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration,” Annals of Internal Medicine, vol. 162, no. 1, pp. W1–W73, 2015. doi: 10.7326/M14-0698.
- J. Zhang, H. Li, N. Ashrafi, Z. Yu, G. Placencia, and M. Pishgar, “Prediction of in-hospital mortality for ICU patients with heart failure,” medRxiv, 2024. doi: 10.1101/2024.06.000000.
- M. O. Akinwande, H. G. Dikko, and A. Samson, “Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis,” Open Journal of Statistics, vol. 5, no. 7, pp. 754-767, 2015. doi: 10.4236/ojs.2015.57075.
- J. Frost, “Independent Samples T-Test: Definition, Examples, Calculator,” Statistics By Jim, 2024. Available from: https://statisticsbyjim.com/hypothesis-testing/independent-samples-t-test-formula-examples/.
- R. I. Hamilton and P. N. Papadopoulos, “Using SHAP values and machine learning to understand trends in the transient stability limit,” IEEE Transactions on Power Systems, vol. 39, no. 1, pp. 1384-1397, 2023. doi: 10.1109/TPWRS.2022.1234567.
- C.-C. Chiu, C.-M. Wu, T.-N. Chien, L.-J. Kao, C. Li, and H.-L. Jiang, “Applying an improved stacking ensemble model to predict the mortality of ICU patients with heart failure,” Journal of Clinical Medicine, vol. 11, no. 21, p. 6460, 2022. doi: 10.3390/jcm11216460.
- Negin Ashrafi (8 papers)
- Armin Abdollahi (9 papers)
- Jiahong Zhang (7 papers)
- Maryam Pishgar (21 papers)