GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs (2305.12788v3)
Abstract: Clinical predictive models often rely on patients' electronic health records (EHR), but integrating medical knowledge to enhance predictions and decision-making is challenging. This is because personalized predictions require personalized knowledge graphs (KGs), which are difficult to generate from patient EHR data. To address this, we propose \textsc{GraphCare}, an open-world framework that uses external KGs to improve EHR-based predictions. Our method extracts knowledge from LLMs and external biomedical KGs to build patient-specific KGs, which are then used to train our proposed Bi-attention AugmenTed (BAT) graph neural network (GNN) for healthcare predictions. On two public datasets, MIMIC-III and MIMIC-IV, \textsc{GraphCare} surpasses baselines in four vital healthcare prediction tasks: mortality, readmission, length of stay (LOS), and drug recommendation. On MIMIC-III, it boosts AUROC by 17.6\% and 6.6\% for mortality and readmission, and F1-score by 7.9\% and 10.8\% for LOS and drug recommendation, respectively. Notably, \textsc{GraphCare} demonstrates a substantial edge in scenarios with limited data availability. Our findings highlight the potential of using external KGs in healthcare prediction tasks and demonstrate the promise of \textsc{GraphCare} in generating personalized KGs for promoting personalized medicine.
- A review on language models as knowledge bases, 2022.
- Readmission prediction using deep learning on electronic health records. Journal of biomedical informatics, 97:103256, 2019.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- Gephi: an open source software for exploring and manipulating networks. In Proceedings of the international AAAI conference on web and social media, volume 3, pp. 361–362, 2009.
- Bio2rdf: towards a mashup to build bioinformatics knowledge systems. Journal of biomedical informatics, 41(5):706–716, 2008.
- Personalizing medication recommendation with a graph-based approach. ACM Transactions on Information Systems (TOIS), 40(3):1–23, 2021.
- Training machine learning models to predict 30-day mortality in patients discharged from the emergency department: a retrospective, population-based registry study. BMJ open, 9(8):e028015, 2019.
- Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270, 2004.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Real-time prediction of mortality, readmission, and length of stay using electronic health record data. Journal of the American Medical Informatics Association, 23(3):553–561, 09 2015. ISSN 1067-5027. doi: 10.1093/jamia/ocv110. URL https://doi.org/10.1093/jamia/ocv110.
- Knowledge is flat: A seq2seq generative framework for various knowledge graph completion. arXiv preprint arXiv:2209.07299, 2022.
- Dipping plms sauce: Bridging structure and text for effective knowledge graph completion via conditional soft prompting. arXiv preprint arXiv:2307.01709, 2023.
- Robustly extracting medical knowledge from ehrs: a case study of learning a health knowledge graph. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, pp. 19–30. World Scientific, 2019.
- Doctor ai: Predicting clinical events via recurrent neural networks. In Machine learning for healthcare conference, pp. 301–318. PMLR, 2016a.
- Multi-layer representation learning for medical concepts. In proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1495–1504, 2016b.
- Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, 29, 2016c.
- Gram: graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 787–795, 2017.
- Mime: Multilevel medical embedding of electronic health records for predictive healthcare. Advances in neural information processing systems, 31, 2018.
- Learning the graphical structure of electronic health records with graph convolutional transformer. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 606–613, 2020.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
- Electronic health record mortality prediction model for targeted palliative care among hospitalized medical patients: a pilot quasi-experimental study. Journal of general internal medicine, 34:1841–1847, 2019.
- Kevin Donnelly et al. Snomed-ct: The advanced terminology and coding system for ehealth. Studies in health technology and informatics, 121:279, 2006.
- Palmer L Elixhauser A, Steiner C. Clinical classifications software (ccs). 03 2016. URL www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp.
- Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
- Stagenet: Stage-aware neural networks for health risk prediction. In Proceedings of The Web Conference 2020, pp. 530–540, 2020.
- Medml: Fusing medical knowledge and machine learning models for early pediatric covid-19 hospitalization and severity prediction. Iscience, 25(9):104970, 2022.
- Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462, 2020.
- Personalized health knowledge graph. In CEUR workshop proceedings, volume 2317. NIH Public Access, 2018.
- Multitask learning and benchmarking with clinical time series data. Scientific data, 6(1):96, 2019.
- Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019.
- Global self-attention as a replacement for graph convolution. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, pp. 655–665, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393850. doi: 10.1145/3534678.3539296. URL https://doi.org/10.1145/3534678.3539296.
- Text augmented open knowledge graph completion via pre-trained language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp. 11161–11180, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.709. URL https://aclanthology.org/2023.findings-acl.709.
- Mimic-iv. PhysioNet. Available online at: https://physionet. org/content/mimiciv/1.0/(accessed August 23, 2021), 2020.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Understanding attention and generalization in graph neural networks. Advances in neural information processing systems, 32, 2019.
- Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
- Real-world integration of genomic data into the electronic health record: the pennchart genomics initiative. Genetics in Medicine, 23(4):603–605, 2021.
- Graph classification using structural attention. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1666–1674, 2018.
- Machine-learning-based hospital discharge predictions can support multidisciplinary rounds and decrease hospital length-of-stay. BMJ Innovations, 7(2), 2021.
- Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering, pp. 1–17, 2022.
- Graph neural network-based diagnosis prediction. Big Data, 8(5):379–390, 2020.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
- A framework for adapting pre-trained language models to knowledge graph completion. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 5937–5955, 2022.
- Collaborative graph learning with auxiliary text for temporal event prediction in healthcare. arXiv preprint arXiv:2105.07542, 2021a.
- Self-supervised graph learning with hyperbolic embedding for temporal health event prediction. IEEE Transactions on Cybernetics, 2021b.
- Chatting about chatgpt: how may ai and gpt impact academia and libraries? Library Hi Tech News, 40(3):26–29, 2023.
- BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), sep 2022. doi: 10.1093/bib/bbac409. URL https://doi.org/10.1093%2Fbib%2Fbbac409.
- Do pre-trained models benefit knowledge graph completion? a reliable evaluation and a reasonable approach. Association for Computational Linguistics, 2022.
- Kame: Knowledge-based attention model for diagnosis prediction in healthcare. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 743–752, 2018.
- Adacare: Explainable clinical health status representation learning via scale-adaptive feature extraction and recalibration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 825–832, 2020a.
- Concare: Personalized clinical feature embedding via capturing the healthcare context. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 833–840, 2020b.
- Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports, 6(1):1–10, 2016.
- Daniel Müllner. Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378, 2011.
- Deepr: a convolutional net for medical records. IEEE journal of biomedical and health informatics, 21(1):22–30, 2016.
- Bioportal: ontologies and integrated data resources at the click of a mouse. Nucleic acids research, 37(suppl_2):W170–W173, 2009.
- OpenAI. Gpt-4 technical report, 2023.
- Locating relevant patient information in electronic health record data using representations of clinical concepts and database structures. In AMIA Annual Symposium Proceedings, volume 2014, pp. 969. American Medical Informatics Association, 2014.
- Doctor xai: an ontology-based approach to black-box sequential data classification explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pp. 629–639, 2020.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
- Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
- Individualized knowledge graph: a viable informatics path to precision medicine. Circulation research, 120(7):1078–1080, 2017.
- Recipe for a general, powerful, scalable graph transformer. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 14501–14515. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/5d4834a159f1547b267a05a4e2b7cf5e-Paper-Conference.pdf.
- Personal health knowledge graphs for patients. arXiv preprint arXiv:2004.00071, 2020.
- Learning a health knowledge graph from electronic medical records. Scientific reports, 7(1):1–11, 2017.
- Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346, 2019a.
- Gamenet: Graph augmented memory networks for recommending medication combination. In proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 1126–1133, 2019b.
- Towards controllable biases in language generation. arXiv preprint arXiv:2005.00268, 2020.
- Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE journal of biomedical and health informatics, 22(5):1589–1604, 2017.
- Applying personal knowledge graphs to health. arXiv preprint arXiv:2104.07587, 2021.
- Gate: graph-attention augmented temporal neural network for medication recommendation. IEEE Access, 8:125447–125458, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Language models are open knowledge graphs. arXiv preprint arXiv:2010.11967, 2020a.
- Multi-hop attention graph neural network. arXiv preprint arXiv:2009.14332, 2020b.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
- Readmission prediction via deep contextual embedding of clinical concepts. PloS one, 13(4):e0195024, 2018.
- Ehr coding with multi-scale feature attention and structured knowledge graph propagation. In Proceedings of the 28th ACM international conference on information and knowledge management, pp. 649–658, 2019.
- Change matters: Medication change prediction with recurrent residual networks. arXiv preprint arXiv:2105.01876, 2021a.
- Safedrug: Dual molecular graph encoders for recommending effective and safe drug combinations. arXiv preprint arXiv:2105.02711, 2021b.
- Pyhealth: A deep learning toolkit for healthcare applications. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pp. 5788–5789, New York, NY, USA, 2023a. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599178. URL https://doi.org/10.1145/3580305.3599178.
- Molerec: Combinatorial drug recommendation with substructure-aware molecular representation learning. In Proceedings of the ACM Web Conference 2023, pp. 4075–4085, 2023b.
- Kg-bert: Bert for knowledge graph completion. arXiv preprint arXiv:1909.03193, 2019.
- Domain knowledge guided deep learning with electronic health records. In 2019 IEEE International Conference on Data Mining (ICDM), pp. 738–747. IEEE, 2019.
- Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8082–8096, 2021a.
- Grasp: generic framework for health status representation learning based on incorporating knowledge from similar patients. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 715–723, 2021b.
- Gaan: Gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint arXiv:1803.07294, 2018.
- Diagnostic prediction with sequence-of-sets representation learning for clinical events. In Artificial Intelligence in Medicine: 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Minneapolis, MN, USA, August 25–28, 2020, Proceedings 18, pp. 348–358. Springer, 2020.
- Variationally regularized graph-based representation learning for electronic health records. In Proceedings of the Conference on Health, Inference, and Learning, pp. 1–13, 2021.