Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inference of Dependency Knowledge Graph for Electronic Health Records (2312.15611v1)

Published 25 Dec 2023 in stat.ME and stat.ML

Abstract: The effective analysis of high-dimensional Electronic Health Record (EHR) data, with substantial potential for healthcare research, presents notable methodological challenges. Employing predictive modeling guided by a knowledge graph (KG), which enables efficient feature selection, can enhance both statistical efficiency and interpretability. While various methods have emerged for constructing KGs, existing techniques often lack statistical certainty concerning the presence of links between entities, especially in scenarios where the utilization of patient-level EHR data is limited due to privacy concerns. In this paper, we propose the first inferential framework for deriving a sparse KG with statistical guarantee based on the dynamic log-linear topic model proposed by \cite{arora2016latent}. Within this model, the KG embeddings are estimated by performing singular value decomposition on the empirical pointwise mutual information matrix, offering a scalable solution. We then establish entrywise asymptotic normality for the KG low-rank estimator, enabling the recovery of sparse graph edges with controlled type I error. Our work uniquely addresses the under-explored domain of statistical inference about non-linear statistics under the low-rank temporal dependent models, a critical gap in existing research. We validate our approach through extensive simulation studies and then apply the method to real-world EHR data in constructing clinical KGs and generating clinical feature embeddings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Large-scale structural and textual similarity-based mining of knowledge graph to predict drug–drug interactions. Journal of Web Semantics 44, 104–117.
  2. Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities. Journal of Big Data 10(1), 81.
  3. Semisupervised Calibration of Risk with Noisy Event Times (SCORNET) using electronic health record data. Biostatistics.
  4. A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics 4, 385–399.
  5. Linear algebraic structure of word senses, with applications to polysemy. Transactions of the Association for Computational Linguistics 6, 483–495.
  6. Behaviour in frontotemporal dementia, Alzheimer’s disease and vascular dementia. Acta neurológica scandinavica 103(6), 367–378.
  7. Network analysis of unstructured EHR data for clinical research. AMIA Summits on Translational Science Proceedings 2013, 14–18.
  8. Clinical concept embeddings learned from massive sources of multimodal medical data. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, pp.  295–306. World Scientific.
  9. The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165–1188.
  10. A semantic matching energy function for learning with multi-relational data: Application to word-sense disambiguation. Machine Learning 94, 233–259.
  11. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26.
  12. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory 56(5), 2053–2080.
  13. Uncertainty quantification for matrix compressed sensing and quantum tomography problems. In High Dimensional Probability VIII: The Oaxaca Volume, pp.  385–430. Springer.
  14. Robustly extracting medical knowledge from EHRs: a case study of learning a health knowledge graph. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, pp.  19–30. World Scientific.
  15. Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences 116(46), 22931–22937.
  16. Inference for low-rank models. The Annals of statistics 51(3), 1309–1330.
  17. A survey on knowledge graph embedding: Approaches, applications and benchmarks. Electronics 9(5), 750.
  18. Davis, C. and W. M. Kahan (1970). The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis 7(1), 1–46.
  19. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp.  4171–4186.
  20. Building the graph of medicine from millions of clinical narratives. Scientific Data 1, 140032.
  21. De-biasing low-rank projection for matrix completion. In Wavelets and sparsity XVII, Volume 10394, pp.  269–281. SPIE.
  22. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare 3, 1–23.
  23. Bert based clinical knowledge extraction for biomedical knowledge graph construction and analysis. Computer Methods and Programs in Biomedicine Update 1, 100042.
  24. Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data. NPJ digital medicine 4(1), 1–11.
  25. Gated tree-based graph attention network (gtgat) for medical knowledge graph reasoning. Artificial Intelligence in Medicine 130, 102329.
  26. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240.
  27. Self-alignment pretraining for biomedical entity representations. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  4228–4238.
  28. Knowledge graph embedding with electronic health records data via latent graphical block model.
  29. Using UMLS concept unique identifiers (CUIs) for word sense disambiguation in the biomedical domain. In AMIA Annual Symposium Proceedings, Volume 2007, pp.  533–537. American Medical Informatics Association.
  30. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26, 3111–3119.
  31. Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis. Journal of the American Medical Informatics Association 29(3), 424–434.
  32. A three-way model for collective learning on multi-relational data. In Icml, Volume 11, pp.  3104482–3104584.
  33. Nussbaum, R. L. and C. E. Ellis (2003). Alzheimer’s disease and parkinson’s disease. New england journal of medicine 348(14), 1356–1364.
  34. Galantamine for Alzheimer’s disease. The Cochrane database of systematic reviews (3), CD001747–CD001747.
  35. OpenAI (2023a). Chatgpt: optimizing language models for dialogue. https://openai.com/blog/chatgpt/.
  36. OpenAI (2023b). Gpt-4 technical report. ArXiv. https://openai.com/research/gpt-4.
  37. Treating behavioral and psychological symptoms in patients with psychosis of Alzheimer’s disease using risperidone. International psychogeriatrics 19(2), 227–240.
  38. Memantine in moderate-to-severe Alzheimer’s disease. New England Journal of Medicine 348(14), 1333–1341.
  39. Learning a health knowledge graph from electronic medical records. Scientific reports 7(1), 1–11.
  40. Incorporating medical knowledge in bert for clinical relation extraction. In Proceedings of the 2021 conference on empirical methods in natural language processing, pp.  5357–5366.
  41. Sematyp: a knowledge graph based literature mining method for drug discovery. BMC bioinformatics 19, 1–11.
  42. EHR-oriented knowledge graph system: toward efficient utilization of non-used information buried in routine clinical practice. IEEE Journal of Biomedical and Health Informatics 25(7), 2463–2475.
  43. Reasoning with neural tensor networks for knowledge base completion. Advances in neural information processing systems 26.
  44. Epidemiology of dementias and Alzheimer’s disease. Archives of medical research 43(8), 600–608.
  45. Tsuno, N. (2009). Donepezil in the treatment of patients with Alzheimer’s disease. Expert review of neurotherapeutics 9(5), 591–598.
  46. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI conference on artificial intelligence, Volume 28.
  47. Multimodal representation learning for predicting molecule–disease relations. Bioinformatics 39(2), btad085.
  48. Xia, D. (2021). Normal approximation and confidence region of singular subspaces. Electronic Journal of Statistics 15(2), 3798–3851.
  49. Inference for low-rank tensors—no need to debias. The Annals of Statistics 50(2), 1220–1245.
  50. Demonstration of an anti-oxidative stress mechanism of quetiapine. The FEBS Journal 275(14), 3718–3728.
  51. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575.
  52. HKGB: an inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Information Processing & Management 57(6), 102324.
  53. Multi-source learning via completion of block-wise overlapping noisy matrices. Journal of Machine Learning Research 24(221), 1–43.
  54. Multiview incomplete knowledge graph integration with application to cross-institutional EHR data harmonization. Journal of Biomedical Informatics 133, 104147.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhiwei Xu (84 papers)
  2. Ziming Gan (2 papers)
  3. Doudou Zhou (21 papers)
  4. Shuting Shen (5 papers)
  5. Junwei Lu (31 papers)
  6. Tianxi Cai (74 papers)

Summary

We haven't generated a summary for this paper yet.