Papers
Topics
Authors
Recent
Search
2000 character limit reached

IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Published 9 Jan 2024 in cs.LG and cs.AI | (2401.04402v2)

Abstract: Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Personalized medicine and the power of electronic health records. Cell, 177(1):58–69.
  2. The treatment of missing values and its effect on classifier accuracy. In Classification, clustering, and data mining applications, pages 639–647. Springer.
  3. Time-series clustering–a decade review. Information systems, 53:16–38.
  4. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. Bmj, 361.
  5. A review on outlier/anomaly detection in time series data. ACM Computing Surveys (CSUR), 54(3):1–33.
  6. Bioinformatics: new tools and applications in life science and personalized medicine. Applied microbiology and biotechnology, 105(3):937–951.
  7. Benefits and risks of mri in pregnancy. In Seminars in perinatology, pages 301–304. Elsevier.
  8. Brits: Bidirectional recurrent imputation for time series. arXiv preprint arXiv:1805.10572.
  9. Recurrent neural networks for multivariate time series with missing values. Scientific reports, 8(1):1–12.
  10. Deep representation learning for individualized treatment effect estimation using electronic health records. Journal of biomedical informatics, 100:103303.
  11. Generating multi-label discrete patient records using generative adversarial networks. In Machine learning for healthcare conference, pages 286–305. PMLR.
  12. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546. IEEE.
  13. Collinson, P. (2022). Troponin measurement in routine clinical practice: the reality behind the guidelines.
  14. Electronic health records to facilitate clinical research. Clinical Research in Cardiology, 106(1):1–9.
  15. Identifying the types of missingness in quality of life data from clinical trials. Statistics in medicine, 17(5-7):739–756.
  16. The utility of troponin measurement to detect myocardial infarction: review of the current findings. Vascular health and risk management, 6:691.
  17. Saits: Self-attention-based imputation for time series. Expert Systems with Applications, 219:119619.
  18. The frequency of testing for glycated haemoglobin, hba1c, is linked to the probability of achieving target levels in patients with suboptimally controlled diabetes mellitus. Clinical Chemistry and Laboratory Medicine (CCLM), 57(2):296–304.
  19. Understanding receiver operating characteristic (roc) curves. Canadian Journal of Emergency Medicine, 8(1):19–20.
  20. Gp-vae: Deep probabilistic time series imputation. In International conference on artificial intelligence and statistics, pages 1651–1661. PMLR.
  21. A review of generative adversarial networks for electronic health records: applications, evaluation measures and data sources. arXiv preprint arXiv:2203.07018.
  22. Treatments of missing data: A monte carlo comparison of rbhdi, iterative stochastic regression imputation, and expectation-maximization. Structural equation modeling, 7(3):319–355.
  23. Generative adversarial nets. Advances in neural information processing systems, 27:2672–2680.
  24. A review of deep learning models for time series prediction. IEEE Sensors Journal, 21(6):7833–7848.
  25. Long short-term memory. Neural computation, 9(8):1735–1780.
  26. When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC medical research methodology, 17(1):1–10.
  27. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
  28. Digital twins: from personalised medicine to precision public health. Journal of Personalized Medicine, 11(8):745.
  29. The evolving use of electronic health records (ehr) for research. In Seminars in radiation oncology, pages 354–361. Elsevier.
  30. The dangers of parathyroid biopsy. Journal of Otolaryngology-Head & Neck Surgery, 46(1):1–4.
  31. A survey of missing data imputation using generative adversarial networks. In 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pages 454–456. IEEE.
  32. List-wise deletion is evil: what to do about missing data in political science. In Annual Meeting of the American Political Science Association, Boston, volume 52.
  33. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  34. The effects of the irregular sample and missing data in time series analysis. Nonlinear dynamics, psychology, and life sciences.
  35. Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications. NPJ Digital Medicine, 6(1):98.
  36. Semi-supervised rare disease detection using generative adversarial network. arXiv preprint arXiv:1812.00547.
  37. Constraints in clinical cardiology and personalized medicine: Interrelated concepts in clinical cardiology. Cardiogenetics, 11(2):50–67.
  38. Multivariate time series imputation with generative adversarial networks. Advances in neural information processing systems, 31.
  39. E2gan: End-to-end generative adversarial network for multivariate time series imputation. In Proceedings of the 28th international joint conference on artificial intelligence, pages 3094–3100. AAAI Press.
  40. Missing clinical and behavioral health data in a large electronic health record (ehr) system. Journal of the American Medical Informatics Association, 23(6):1143–1149.
  41. K-nearest neighbor in missing data imputation. International Journal of Engineering Research and Development, 5(1):5–7.
  42. The cross entropy method for classification. In Proceedings of the 22nd international conference on Machine learning, pages 561–568.
  43. Handling incomplete heterogeneous data using vaes. Pattern Recognition, 107:107501.
  44. Missing data bias: Exactly how bad is pairwise deletion? In More statistical and methodological myths and urban legends, pages 143–171. Routledge.
  45. Piri, S. (2020). Missing care: A framework to address the issue of frequent missing values; the case of a clinical decision support system for parkinson’s disease. Decision Support Systems, 136:113339.
  46. The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data, 5(1):1–13.
  47. Ramos, J. et al. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, pages 29–48. Citeseer.
  48. An imputation-based matrix factorization method for improving accuracy of collaborative filtering systems. Engineering Applications of Artificial Intelligence, 46:58–66.
  49. Scheffer, J. (2002). Dealing with missing data.
  50. Machine learning for clinical outcome prediction. IEEE Reviews in Biomedical Engineering, 14:116–126.
  51. Last observation carry-forward and last observation analysis. Statistics in medicine, 22(15):2429–2441.
  52. A new insight into missing data in intensive care unit patient profiles: observational study. JMIR medical informatics, 7(1):e11605.
  53. Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. In 2012 Computing in Cardiology, pages 245–248. IEEE.
  54. The area under the precision-recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution, 10(4):565–577.
  55. Learning structured output representation using deep conditional generative models. Advances in neural information processing systems, 28.
  56. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj, 338.
  57. Open-source electronic health record systems for low-resource settings: systematic review. JMIR medical informatics, 5(4):e8131.
  58. Good methods for coping with missing data in decision trees. Pattern Recognition Letters, 29(7):950–956.
  59. mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45:1–67.
  60. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  61. Health digital twins as tools for precision medicine: Considerations for computation, implementation, and regulation.
  62. Learning optimal individualized treatment rules from electronic health record data. In 2016 IEEE International Conference on Healthcare Informatics (ICHI), pages 65–71. IEEE.
  63. Gender-related data missingness, imbalance and bias in global health surveys. BMJ global health, 6(11):e007405.
  64. Strategies for handling missing data in electronic health record derived data. Egems, 1(3).
  65. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association, 25(10):1419–1428.
  66. Gain: Missing data imputation using generative adversarial nets. In International Conference on Machine Learning, pages 5689–5698. PMLR.
  67. Ganite: Estimation of individualized treatment effects using generative adversarial nets. In International Conference on Learning Representations.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.