Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care (2209.07805v4)

Published 16 Sep 2022 in cs.LG

Abstract: The COVID-19 pandemic has posed a heavy burden to the healthcare system worldwide and caused huge social disruption and economic loss. Many deep learning models have been proposed to conduct clinical predictive tasks such as mortality prediction for COVID-19 patients in intensive care units using Electronic Health Record (EHR) data. Despite their initial success in certain clinical applications, there is currently a lack of benchmarking results to achieve a fair comparison so that we can select the optimal model for clinical use. Furthermore, there is a discrepancy between the formulation of traditional prediction tasks and real-world clinical practice in intensive care. To fill these gaps, we propose two clinical prediction tasks, Outcome-specific length-of-stay prediction and Early mortality prediction for COVID-19 patients in intensive care units. The two tasks are adapted from the naive length-of-stay and mortality prediction tasks to accommodate the clinical practice for COVID-19 patients. We propose fair, detailed, open-source data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models on two tasks, including 5 machine learning models, 6 basic deep learning models and 6 deep learning predictive models specifically designed for EHR data. We provide benchmarking results using data from two real-world COVID-19 EHR datasets. One dataset is publicly available without needing any inquiry and another dataset can be accessed on request. We provide fair, reproducible benchmarking results for two tasks. We deploy all experiment results and models on an online platform. We also allow clinicians and researchers to upload their data to the platform and get quick prediction results using our trained models. We hope our efforts can further facilitate deep learning and machine learning research for COVID-19 predictive modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. How bad is omicron? what scientists know so far. \JournalTitleNature 600, 197–199 (2021).
  2. Yan, L. et al. An interpretable mortality prediction model for covid-19 patients. \JournalTitleNature machine intelligence 2, 283–288 (2020).
  3. Jamshidi, E. et al. Using machine learning to predict mortality for covid-19 patients on day 0 in the icu. \JournalTitleFrontiers in digital health 3 (2021).
  4. Martin, B. et al. Characteristics, outcomes, and severity risk factors associated with sars-cov-2 infection among children in the us national covid cohort collaborative. \JournalTitleJAMA network open 5, e2143151–e2143151 (2022).
  5. Nachega, J. B. et al. Assessment of clinical outcomes among children and adolescents hospitalized with covid-19 in 6 sub-saharan african countries. \JournalTitleJAMA pediatrics 176, e216436–e216436 (2022).
  6. Domínguez-Rodríguez, S. et al. A bayesian model to predict covid-19 severity in children. \JournalTitleThe Pediatric Infectious Disease Journal 40, e287–e293 (2021).
  7. Oliveira, E. A. et al. Comparison of the first and second waves of the coronavirus disease 2019 pandemic in children and adolescents in a middle-income country: Clinical impact associated with severe acute respiratory syndrome coronavirus 2 gamma lineage. \JournalTitleThe Journal of pediatrics 244, 178–185 (2022).
  8. Bennett, T. D. et al. Clinical characterization and prediction of clinical severity of sars-cov-2 infection among us adults using data from the us national covid cohort collaborative. \JournalTitleJAMA network open 4, e2116901–e2116901 (2021).
  9. Elliott, J. et al. Covid-19 mortality in the uk biobank cohort: revisiting and evaluating risk factors. \JournalTitleEuropean journal of epidemiology 36, 299–309 (2021).
  10. Gao, J. et al. Medml: Fusing medical knowledge and machine learning models for early pediatric covid-19 hospitalization and severity prediction. \JournalTitleIscience 104970 (2022).
  11. Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. \JournalTitlebmj 369 (2020).
  12. Covid-19 prediction models: a systematic literature review. \JournalTitleOsong public health and research perspectives 12, 215 (2021).
  13. Pirracchio, R. Mortality prediction in the icu based on mimic-ii results from the super icu learner algorithm (sicula) project. \JournalTitleSecondary Analysis of Electronic Health Records 295–313 (2016).
  14. Benchmarking deep learning models on large healthcare datasets. \JournalTitleJournal of biomedical informatics 83, 112–134 (2018).
  15. Multitask learning and benchmarking with clinical time series data. \JournalTitleScientific data 6, 1–18 (2019).
  16. Yèche, H. et al. Hirid-icu-benchmark–a comprehensive machine learning benchmark on high-resolution icu data. \JournalTitlearXiv preprint arXiv:2111.08536 (2021).
  17. Johnson, A. E. et al. Mimic-iii, a freely accessible critical care database. \JournalTitleScientific data 3, 1–9 (2016).
  18. Lee, J. et al. Open-access mimic-ii database for intensive care research. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 8315–8318 (IEEE, 2011).
  19. Early prediction of the risk of icu mortality with deep federated learning. \JournalTitlearXiv preprint arXiv:2212.00554 (2022).
  20. Reyna, M. A. et al. Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. In 2019 Computing in Cardiology (CinC), Page–1 (IEEE, 2019).
  21. Ma, L. et al. Covidcare: Transferring knowledge from existing emr to emerging epidemic for interpretable prognosis. \JournalTitlearXiv preprint arXiv:2007.08848 (2020).
  22. Dr. agent: Clinical predictive model via mimicked second opinions. \JournalTitleJournal of the American Medical Informatics Association 27, 1084–1091 (2020).
  23. Hospitales, H. Covid data save lives. https://www.hmhospitales.com/prensa/notas-de-prensa/comunicado-covid-data-save-lives (2020). Accessed: 2023-09-18.
  24. Ma, L. et al. Distilling knowledge from publicly available online emr data to emerging epidemic for prognosis. In Proceedings of the Web Conference 2021, 3558–3568 (2021).
  25. Ma, L. et al. Adacare: Explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. \JournalTitleProceedings of the AAAI Conference on Artificial Intelligence 34, 825–832, 10.1609/aaai.v34i01.5427 (2020).
  26. Ma, L. et al. Concare: Personalized clinical feature embedding via capturing the healthcare context. \JournalTitleProceedings of the AAAI Conference on Artificial Intelligence 34, 833–840, 10.1609/aaai.v34i01.5428 (2020).
  27. Knight, S. R. et al. Risk stratification of patients admitted to hospital with covid-19 using the isaric who clinical characterisation protocol: development and validation of the 4c mortality score. \JournalTitlebmj 370 (2020).
  28. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794 (2016).
  29. Catboost: gradient boosting with categorical features support. \JournalTitlearXiv preprint arXiv:1810.11363 (2018).
  30. Learning representations by back-propagating errors. \JournalTitlenature 323, 533–536 (1986).
  31. Long short-term memory. \JournalTitleNeural computation 9, 1735–1780 (1997).
  32. Empirical evaluation of gated recurrent neural networks on sequence modeling. \JournalTitlearXiv preprint arXiv:1412.3555 (2014).
  33. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. \JournalTitlearXiv preprint arXiv:1803.01271 (2018).
  34. Vaswani, A. et al. Attention is all you need. \JournalTitleAdvances in neural information processing systems 30 (2017).
  35. Choi, E. et al. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. \JournalTitleAdvances in neural information processing systems 29 (2016).
  36. Gao, J. et al. Stagenet: Stage-aware neural networks for health risk prediction. In Proceedings of The Web Conference 2020, 530–540 (2020).
  37. Zhang, C. et al. Grasp: Generic framework for health status representation learning based on incorporating knowledge from similar patients. \JournalTitleProceedings of the AAAI Conference on Artificial Intelligence 35, 715–723, 10.1609/aaai.v35i1.16152 (2021).
  38. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method. \JournalTitleStatistics in medicine 36, 2187–2205 (2017).
  39. Interpreting diagnostic accuracy studies for patient care. \JournalTitleBmj 345 (2012).
  40. Reducing false arrhythmia alarms in the icu: The physionet/computing in cardiology challenge 2015. \JournalTitleComputing in Cardiology 42, 273–276 (2015).
  41. Johnson, A. et al. Mimic-iv. \JournalTitlePhysioNet. Available online at: https://physionet. org/content/mimiciv/1.0/(accessed August 23, 2021) (2020).
  42. Noh, J. et al. Prediction of the mortality risk in peritoneal dialysis patients using machine learning models: a nation-wide prospective cohort in korea. \JournalTitleScientific reports 10, 1–11 (2020).
  43. Iwendi, C. et al. Covid-19 patient health prediction using boosted random forest algorithm. \JournalTitleFrontiers in public health 8, 357 (2020).
  44. Li, S. et al. Development and external evaluation of predictions models for mortality of covid-19 patients using machine learning method. \JournalTitleNeural Computing and Applications 1–10 (2021).
  45. Kim, J. S. Covid-19 prediction and detection using machine learning algorithms: Catboost and linear regression. \JournalTitleAmerican Journal of Theoretical and Applied Statistics 10, 208–215 (2021).
  46. Tomašev, N. et al. A clinically applicable approach to continuous prediction of future acute kidney injury. \JournalTitleNature 572, 116–119 (2019).
  47. Raket, L. L. et al. Dynamic electronic health record detection (detect) of individuals at risk of a first episode of psychosis: a case-control development and validation study. \JournalTitleThe Lancet Digital Health 2, e229—e239 (2020).
  48. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine learning for healthcare conference, 301–318 (PMLR, 2016).
  49. Thorsen-Meyer, H.-C. et al. Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. \JournalTitleThe Lancet Digital Health 2, e179–e191 (2020).
  50. Meyer, A. et al. Machine learning for real-time prediction of complications in critical care: a retrospective study. \JournalTitleThe Lancet Respiratory Medicine 6, 905–914 (2018).
  51. Learning models for forecasting hospital resource utilization for covid-19 patients in canada. \JournalTitleScientific reports 12, 1–14 (2022).
  52. Nitski, O. et al. Long-term mortality risk stratification of liver transplant recipients: real-time application of deep learning algorithms on longitudinal data. \JournalTitleThe Lancet Digital Health 3, e295–e305 (2021).
  53. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 259–268 (2015).
  54. Equality of opportunity in supervised learning. \JournalTitleAdvances in neural information processing systems 29 (2016).
  55. Three naive bayes approaches for discrimination-free classification. \JournalTitleData mining and knowledge discovery 21, 277–292 (2010).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Junyi Gao (20 papers)
  2. Yinghao Zhu (45 papers)
  3. Wenqing Wang (22 papers)
  4. Yasha Wang (47 papers)
  5. Wen Tang (33 papers)
  6. Ewen M. Harrison (4 papers)
  7. Liantao Ma (23 papers)
Citations (11)