Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PRISM: Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration for EHR Data Sparsity Mitigation (2309.04160v5)

Published 8 Sep 2023 in cs.LG and cs.AI

Abstract: Electronic Health Records (EHRs) contain a wealth of patient data; however, the sparsity of EHRs data often presents significant challenges for predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data by leveraging prototype representations of similar patients, thus ensuring compact representations that preserve patient information. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, PRISM introduces a new patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have made the code publicly available at https://github.com/yhzhu99/PRISM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Recurrent neural networks for multivariate time series with missing values. Scientific reports, 8(1):6085, 2018.
  2. Hgmf: heterogeneous graph-based fusion for multimodal data with incompleteness. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1295–1305, 2020.
  3. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, 29, 2016.
  4. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.
  5. William A Falcon. Pytorch lightning. GitHub, 3, 2019.
  6. FM Ford and J Ford. Non-attendance for social security medical examination: patients who cannot afford to get better? Occupational medicine, 50(7):504–507, 2000.
  7. A comprehensive benchmark for covid-19 predictive modeling using electronic health records in intensive care. Available at SSRN 4580461, 2023.
  8. Multitask learning and benchmarking with clinical time series data. Scientific data, 6(1):96, 2019.
  9. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  10. Study on patient similarity measurement based on electronic medical records. Studies in health technology and informatics,Studies in health technology and informatics, Aug 2019.
  11. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  12. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1, 2023.
  13. An empirical evaluation of sampling methods for the classification of imbalanced data. PLoS One, 17(7):e0271260, 2022.
  14. Personalized mortality prediction driven by electronic medical data and a patient similarity metric. PLOS ONE, 10(5):e0127428, May 2015.
  15. Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques. Artificial Intelligence in Medicine, page 102587, 2023.
  16. Miles Lopes. Estimating unknown sparsity in compressed sensing. In International Conference on Machine Learning, pages 217–225. PMLR, 2013.
  17. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  18. Adacare: Explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):825–832, Apr. 2020.
  19. Concare: Personalized clinical feature embedding via capturing the healthcare context. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):833–840, Apr. 2020.
  20. Patient health representation learning via correlational sparse prior of medical features. IEEE Transactions on Knowledge and Data Engineering, 2022.
  21. Mortality prediction with adaptive feature importance recalibration for peritoneal dialysis patients. Patterns, 4(12), 2023.
  22. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  23. Deep patient similarity learning for personalized healthcare. IEEE Transactions on NanoBioscience, page 219–227, Jul 2018.
  24. mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45:1–67, 2011.
  25. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on knowledge and data engineering, 25(6):1336–1353, 2012.
  26. Last-observation-carried-forward imputation method in clinical efficacy trials: review of 352 antidepressant studies. Pharmacotherapy: The Journal of Human Pharmacology and Drug Therapy, 29(12):1408–1416, 2009.
  27. An interpretable mortality prediction model for covid-19 patients. Nature machine intelligence, 2(5):283–288, 2020.
  28. Grasp: Generic framework for health status representation learning based on incorporating knowledge from similar patients. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1):715–723, May 2021.
  29. M3care: Learning with missing modalities in multimodal healthcare data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 2418–2428, New York, NY, USA, 2022. Association for Computing Machinery.
  30. Hi-net: hybrid-fusion network for multi-modal mr image synthesis. IEEE transactions on medical imaging, 39(9):2772–2781, 2020.
  31. M3fair: Mitigating bias in healthcare data through multi-level and multi-sensitive-attribute reweighting method. arXiv preprint arXiv:2306.04118, 2023.
  32. Pyehr: A predictive modeling toolkit for electronic health records. https://github.com/yhzhu99/pyehr, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yinghao Zhu (45 papers)
  2. Zixiang Wang (17 papers)
  3. Long He (17 papers)
  4. Shiyun Xie (6 papers)
  5. Liantao Ma (23 papers)
  6. Chengwei Pan (30 papers)
  7. Xiaochen Zheng (29 papers)
Citations (2)