Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records (2312.13454v3)

Published 20 Dec 2023 in cs.LG and stat.ME

Abstract: Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8,211 subjects with 75,187 outpatient claim records of 1,767 unique ICD codes; the MIMIC-III consisting of 1,458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. “Pre-pandemic assessment: a decade of progress in electronic health record adoption among US hospitals” In Health Affairs Scholar 1.5 Oxford University Press US, 2023, pp. qxad056
  2. Jordan W Smoller “The use of electronic health records for psychiatric phenotyping and genomics” In American Journal of Medical Genetics Part B: Neuropsychiatric Genetics 177.7 Wiley Online Library, 2018, pp. 601–612
  3. “A review of automatic phenotyping approaches using electronic health records” In Electronics 8.11 MDPI, 2019, pp. 1235
  4. “A review of approaches to identifying patient phenotype cohorts using electronic health records” In Journal of the American Medical Informatics Association 21.2 BMJ Publishing Group, 2014, pp. 221–230
  5. Peter B Jensen, Lars J Jensen and Søren Brunak “Mining electronic health records: towards better research applications and clinical care” In Nature Reviews Genetics 13.6 Nature Publishing Group UK London, 2012, pp. 395–405
  6. “Analysis of free text in electronic health records for identification of cancer patient trajectories” In Scientific reports 7.1 Nature Publishing Group UK London, 2017, pp. 46226
  7. “Significance of machine learning in healthcare: Features, pillars and applications” In International Journal of Intelligent Networks 3 Elsevier, 2022, pp. 58–73
  8. “Deep patient: an unsupervised representation to predict the future of patients from the electronic health records” In Scientific reports 6.1 Nature Publishing Group, 2016, pp. 1–10
  9. “Deep survival analysis” In Machine Learning for Healthcare Conference, 2016, pp. 101–114 PMLR
  10. “Deephit: A deep learning approach to survival analysis with competing risks” In Proceedings of the AAAI conference on artificial intelligence 32.1, 2018
  11. “Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality” In ESC heart failure 8.1 Wiley Online Library, 2021, pp. 106–115
  12. David R Cox “Regression models and life-tables” In Journal of the Royal Statistical Society: Series B (Methodological) 34.2 Wiley Online Library, 1972, pp. 187–202
  13. “Kernel Cox regression models for linking gene expression profiles to censored survival data” In Biocomputing 2003 World Scientific, 2002, pp. 65–76
  14. “Random survival forests”, 2008
  15. Robert Tibshirani “The lasso method for variable selection in the Cox model” In Statistics in medicine 16.4 Wiley Online Library, 1997, pp. 385–395
  16. Scott M Lundberg and Su-In Lee “A unified approach to interpreting model predictions” In Advances in neural information processing systems 30, 2017
  17. Hugh Chen, Scott M Lundberg and Su-In Lee “Explaining a series of models by propagating Shapley values” In Nature communications 13.1 Nature Publishing Group UK London, 2022, pp. 4512
  18. “From local explanations to global understanding with explainable AI for trees” In Nature machine intelligence 2.1 Nature Publishing Group UK London, 2020, pp. 56–67
  19. David M Blei, Andrew Y Ng and Michael I Jordan “Latent dirichlet allocation” In Journal of machine Learning research 3.Jan, 2003, pp. 993–1022
  20. “Inferring multimodal latent topics from electronic health records” In Nature communications 11.1 Nature Publishing Group UK London, 2020, pp. 2536
  21. “Supervised multi-specialist topic model with applications on large-scale electronic health record data” In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 2021, pp. 1–26
  22. “Automatic phenotyping by a seed-guided topic model” In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 4713–4723
  23. “MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record” In Journal of biomedical informatics 134 Elsevier, 2022, pp. 104190
  24. John A Dawson and Christina Kendziorski “Survival-supervised latent Dirichlet allocation models for genomic analysis of time-to-event outcomes” In arXiv preprint arXiv:1202.5999, 2012
  25. Yee Teh, David Newman and Max Welling “A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation” In Advances in neural information processing systems 19, 2006
  26. “Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record” In PloS one 12.7 Public Library of Science San Francisco, CA USA, 2017, pp. e0175508
  27. “Regularization paths for Cox’s proportional hazards model via coordinate descent” In Journal of statistical software 39.5 NIH Public Access, 2011, pp. 1
  28. Ralf Bender, Thomas Augustin and Maria Blettner “Generating survival times to simulate Cox proportional hazards models” In Statistics in medicine 24.11 Wiley Online Library, 2005, pp. 1713–1723
  29. “MIMIC-III, a freely accessible critical care database” In Scientific data 3.1 Nature Publishing Group, 2016, pp. 1–9
  30. “Evaluating prediction rules for t-year survivors with censored regression models” In Journal of the American Statistical Association 102.478 Taylor & Francis, 2007, pp. 527–537
  31. “Estimation methods for time-dependent AUC models with survival data” In Canadian Journal of Statistics 38.1 Wiley Online Library, 2010, pp. 8–26
  32. “Summary measure of discrimination in survival models based on cumulative/dynamic time-dependent ROC curves” In Statistical methods in medical research 25.5 SAGE Publications Sage UK: London, England, 2016, pp. 2088–2102
  33. Abel Wakai, Ian G Roberts and Gillian Schierhout “Mannitol for acute traumatic brain injury” In Cochrane Database of Systematic Reviews John Wiley & Sons, Ltd, 2005
  34. “Performance of a machine learning algorithm using electronic health record data to identify and estimate survival in a longitudinal cohort of patients with lung cancer” In JAMA Network Open 4.7 American Medical Association, 2021, pp. e2114723–e2114723
  35. “Phenotree: Interactive visual analytics for hierarchical phenotyping from large-scale electronic health records” In IEEE Transactions on Multimedia 18.11 IEEE, 2016, pp. 2257–2270
  36. “Learning probabilistic phenotypes from heterogeneous EHR data” In Journal of biomedical informatics 58 Elsevier, 2015, pp. 156–165
  37. “Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer” In BMC medical informatics and decision making 14.1 BioMed Central, 2014, pp. 1–15
  38. “Deep LDA: A new way to topic model” In Journal of Information and Optimization Sciences 41.3 Taylor & Francis, 2020, pp. 823–834
  39. “A novel neural topic model and its supervised extension” In Proceedings of the AAAI Conference on Artificial Intelligence 29.1, 2015
  40. “Topic modelling meets deep neural networks: A survey” In arXiv preprint arXiv:2103.00498, 2021
  41. “A graph-embedded topic model enables characterization of diverse pain phenotypes among UK biobank individuals” In Iscience 25.6 Elsevier, 2022
  42. “Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model” In Scientific Reports 12.1 Nature Publishing Group UK London, 2022, pp. 17868
  43. Victor Veitch, Dhanya Sridhar and David Blei “Adapting text embeddings for causal inference” In Conference on Uncertainty in Artificial Intelligence, 2020, pp. 919–928 PMLR
  44. “Mining causal topics in text data: iterative topic modeling with time series feedback” In Proceedings of the 22nd ACM international conference on information & knowledge management, 2013, pp. 885–890
  45. “Inferring causal phenotype networks using structural equation models” In Genetics Selection Evolution 43.1 BioMed Central, 2011, pp. 1–13
  46. “Formalising recall by genotype as an efficient approach to detailed phenotyping and causal inference” In Nature Communications 9.1 Nature Publishing Group UK London, 2018, pp. 711
  47. “Network-medicine framework for studying disease trajectories in US veterans” In Scientific Reports 12.1 Nature Publishing Group UK London, 2022, pp. 12018
  48. “Disease trajectories and mortality among individuals diagnosed with depression: a community-based cohort study in UK Biobank” In Molecular psychiatry 26.11 Nature Publishing Group UK London, 2021, pp. 6736–6746
  49. “A computational method for learning disease trajectories from partially observable EHR data” In IEEE journal of biomedical and health informatics 25.7 IEEE, 2021, pp. 2476–2486
  50. “High-throughput multimodal automated phenotyping (MAP) with application to PheWAS” In Journal of the American Medical Informatics Association 26.11 Oxford University Press, 2019, pp. 1255–1262
  51. Thomas L Griffiths and Mark Steyvers “Finding scientific topics” In Proceedings of the National academy of Sciences 101.suppl_1 National Acad Sciences, 2004, pp. 5228–5235
  52. “Rethinking collapsed variational Bayes inference for LDA” In arXiv preprint arXiv:1206.6435, 2012
  53. Thomas Minka “Estimating a Dirichlet distribution” Technical report, MIT, 2000
  54. Sebastian Pölsterl “scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn” In Journal of Machine Learning Research 21.212, 2020, pp. 1–6 URL: http://jmlr.org/papers/v21/20-729.html
  55. Terry M Therneau “A Package for Survival Analysis in R” R package version 3.5-7, 2023 URL: https://CRAN.R-project.org/package=survival
Citations (1)

Summary

We haven't generated a summary for this paper yet.