HealthGAT: Node Classifications in Electronic Health Records using Graph Attention Networks (2403.18128v1)
Abstract: While electronic health records (EHRs) are widely used across various applications in healthcare, most applications use the EHRs in their raw (tabular) format. Relying on raw or simple data pre-processing can greatly limit the performance or even applicability of downstream tasks using EHRs. To address this challenge, we present HealthGAT, a novel graph attention network framework that utilizes a hierarchical approach to generate embeddings from EHR, surpassing traditional graph-based methods. Our model iteratively refines the embeddings for medical codes, resulting in improved EHR data analysis. We also introduce customized EHR-centric auxiliary pre-training tasks to leverage the rich medical knowledge embedded within the data. This approach provides a comprehensive analysis of complex medical relationships and offers significant advancement over standard data representation techniques. HealthGAT has demonstrated its effectiveness in various healthcare scenarios through comprehensive evaluations against established methodologies. Specifically, our model shows outstanding performance in node classification and downstream tasks such as predicting readmissions and diagnosis classifications. Our code is available at https://github.com/healthylaife/HealthGAT
- S. N. Golmaei and X. Luo, “Deepnote-gnn: predicting hospital readmission using clinical notes and patient network,” in Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 2021, pp. 1–9.
- F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.
- D. O. JG and F. E. Mustafa, “Graph neural network modelling as a potentially effective method for predicting and analyzing procedures based on patients’ diagnoses.” Artificial Intelligence in Medicine, vol. 131, pp. 102 359–102 359, 2022.
- T. Wanyan, H. Honarvar, A. Azad, Y. Ding, and B. S. Glicksberg, “Deep learning with heterogeneous graph embeddings for mortality prediction from electronic health records,” Data Intelligence, vol. 3, no. 3, pp. 329–339, 2021.
- J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” Advances in neural information processing systems, vol. 29, 2016.
- Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolutional networks for semi-supervised learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
- P. Kazienko and T. Kajdanowicz, “Label-dependent node classification in the network,” Neurocomputing, vol. 75, no. 1, pp. 199–209, 2012.
- M. Quinn, J. Forman, M. Harrod, S. Winter, K. E. Fowler, S. L. Krein, A. Gupta, S. Saint, H. Singh, and V. Chopra, “Electronic health records, communication, and data sharing: challenges and opportunities for improving the diagnostic process,” Diagnosis, vol. 6, no. 3, pp. 241–248, 2019.
- F. Manessi and A. Rozza, “Graph-based neural network models with multiple self-supervised auxiliary tasks,” Pattern Recognition Letters, vol. 148, pp. 15–21, 2021.
- E. Choi, C. Xiao, W. Stewart, and J. Sun, “Mime: Multilevel medical embedding of electronic health records for predictive healthcare,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31. Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/934b535800b1cba8f96a5d72f72f1611-Paper.pdf
- A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864.
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio et al., “Graph attention networks,” stat, vol. 1050, no. 20, pp. 10–48 550, 2017.
- T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi, “The eicu collaborative research database, a freely available multi-center database for critical care research,” Scientific data, vol. 5, no. 1, pp. 1–13, 2018.
- M. Gupta, B. Gallamoza, N. Cutrona, P. Dhakal, R. Poulain, and R. Beheshti, “An extensive data processing pipeline for mimic-iv,” in Proceedings of the 2nd Machine Learning for Health symposium, ser. Proceedings of Machine Learning Research, A. Parziale, M. Agrawal, S. Joshi, I. Y. Chen, S. Tang, L. Oala, and A. Subbaswamy, Eds., vol. 193. PMLR, 28 Nov 2022, pp. 311–325. [Online]. Available: https://proceedings.mlr.press/v193/gupta22a.html
- M. Gupta, T.-L. T. Phan, D. Eckrich, H. T. Bunnell, and R. Beheshti, “Reliable prediction of childhood obesity using only routinely collected ehrs is possible,” medRxiv, 2024. [Online]. Available: https://www.medrxiv.org/content/early/2024/01/31/2024.01.29.24301945
- Z. Sun, H. Yin, H. Chen, T. Chen, L. Cui, and F. Yang, “Disease prediction via graph neural networks,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 3, pp. 818–826, 2020.
- I. Landi, B. S. Glicksberg, H.-C. Lee, S. Cherng, G. Landi, M. Danieletto, J. T. Dudley, C. Furlanello, and R. Miotto, “Deep representation learning of electronic health records to unlock patient stratification at scale,” NPJ digital medicine, vol. 3, no. 1, p. 96, 2020.
- Y. Li, B. Qian, X. Zhang, and H. Liu, “Graph neural network-based diagnosis prediction,” Big Data, vol. 8, no. 5, pp. 379–390, 2020.
- R. Poulain and R. Beheshti, “Graph transformers on EHRs: Better representation improves downstream performance,” in The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=pe0Vdv7rsL
- B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710.
- S. Cao, W. Lu, and Q. Xu, “Grarep: Learning graph representations with global structural information,” in Proceedings of the 24th ACM international on conference on information and knowledge management, 2015, pp. 891–900.
- M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, “Asymmetric transitivity preserving graph embedding,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 1105–1114.
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th international conference on world wide web, 2015, pp. 1067–1077.
- W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
- J.-Y. Jiang, Z. Li, C. J.-T. Ju, and W. Wang, “Maru: Meta-context aware random walks for heterogeneous network representation learning,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 575–584.
- Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable representation learning for heterogeneous networks,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 135–144.
- S. Chang, W. Han, J. Tang, G.-J. Qi, C. C. Aggarwal, and T. S. Huang, “Heterogeneous network embedding via deep architectures,” in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 119–128.
- T. Wu, Y. Wang, Y. Wang, E. Zhao, and Y. Yuan, “Leveraging graph-based hierarchical medical entity embedding for healthcare applications,” Scientific reports, vol. 11, no. 1, p. 5858, 2021.
- N. Talati, D. Jin, H. Ye, A. Brahmakshatriya, G. Dasika, S. Amarasinghe, T. Mudge, D. Koutra, and R. Dreslinski, “A deep dive into understanding the random walk-based temporal graph learning,” in 2021 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2021, pp. 87–100.
- H. Lu and S. Uddin, “Disease prediction using graph machine learning based on electronic health data: A review of approaches and trends,” in Healthcare, vol. 11, no. 7. MDPI, 2023, p. 1031.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013.
- J. Gao, X. Zhang, L. Tian, Y. Liu, J. Wang, Z. Li, and X. Hu, “Mtgnn: multi-task graph neural network based few-shot learning for disease similarity measurement,” Methods, vol. 198, pp. 88–95, 2022.
- E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun, “Gram: graph-based attention model for healthcare representation learning,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 787–795.
- Y. Wu, Y. Song, H. Huang, F. Ye, X. Xie, and H. Jin, “Enhancing graph neural networks via auxiliary training for semi-supervised node classification,” Knowledge-Based Systems, vol. 220, p. 106884, 2021.
- J. Lv, K. Song, Q. Ye, and G. Tian, “Semi-supervised node classification via fine-grained graph auxiliary augmentation learning,” Pattern Recognition, vol. 137, p. 109301, 2023.
- M. P. LaValley, “Logistic regression,” Circulation, vol. 117, no. 18, pp. 2395–2399, 2008.
- J. Wu, J. Roy, and W. F. Stewart, “Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches,” Medical care, pp. S106–S113, 2010.
- R. Poulain, M. F. Bin Tarek, and R. Beheshti, “Improving fairness in ai models on electronic health records: The case for federated learning methods,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, ser. FAccT ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 1599–1608. [Online]. Available: https://doi.org/10.1145/3593013.3594102
- Fahmida Liza Piya (2 papers)
- Mehak Gupta (11 papers)
- Rahmatollah Beheshti (17 papers)