Papers
Topics
Authors
Recent
Search
2000 character limit reached

RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records

Published 25 Feb 2024 in cs.CL, cs.AI, cs.IR, and q-bio.OT | (2403.00815v3)

Abstract: We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning, pages 2206–2240. PMLR.
  2. Hypergraph contrastive learning for electronic health records. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 127–135. SIAM.
  3. Building a knowledge graph to enable precision medicine. Scientific Data, 10(1):67.
  4. Gram: graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 787–795.
  5. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, 29.
  6. Learning the graphical structure of electronic health records with graph convolutional transformer. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 606–613.
  7. Electronic health records to facilitate clinical research. Clinical Research in Cardiology, 106:1–9.
  8. Stagenet: Stage-aware neural networks for health risk prediction. In Proceedings of The Web Conference 2020, pages 530–540.
  9. Leveraging a medical knowledge graph into large language models for diagnosis prediction.
  10. Medto: Medical data to ontology matching using hybrid graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2946–2954.
  11. Multitask learning and benchmarking with clinical time series data. Scientific data, 6(1):96.
  12. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
  13. Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 24(251):1–43.
  14. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, 13(6):395–405.
  15. Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models.
  16. Graphcare: Enhancing healthcare predictions with open-world personalized knowledge graphs. In The Twelfth International Conference on Learning Representations.
  17. Large language models on graphs: A comprehensive survey.
  18. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
  19. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.
  20. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  21. Co-training improves prompt-based learning for large language models. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 11985–12003. PMLR.
  22. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  23. Behrt: transformer for electronic health records. Scientific reports, 10(1):7155.
  24. How to train your dragon: Diverse augmentation towards generalizable dense retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6385–6400, Singapore. Association for Computational Linguistics.
  25. Collaborative graph learning with auxiliary text for temporal event prediction in healthcare. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence.
  26. Hitanet: Hierarchical time-aware attention networks for risk prediction on electronic health records. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 647–656.
  27. UmlsBERT: Clinical domain knowledge augmentation of contextual embeddings using the Unified Medical Language System Metathesaurus. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1744–1753, Online. Association for Computational Linguistics.
  28. Literature-augmented clinical outcome prediction. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 438–453, Seattle, United States. Association for Computational Linguistics.
  29. Doctor xai: an ontology-based approach to black-box sequential data classification explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 629–639.
  30. Retrieval augmented code generation and summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2719–2734, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  31. Lightweight transformers for clinical natural language processing.
  32. Replug: Retrieval-augmented black-box language models.
  33. Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records.
  34. Retrieval-augmented large language models for adolescent idiopathic scoliosis patients in shared decision-making. In Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, New York, NY, USA. Association for Computing Machinery.
  35. Large language models with retrieval-augmented generation for zero-shot disease phenotyping. In Deep Generative Models for Health Workshop NeurIPS 2023.
  36. Fine-tuning large neural language models for biomedical natural language processing.
  37. Clinical outcome prediction from admission notes using self-supervised knowledge integration. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 881–893, Online. Association for Computational Linguistics.
  38. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  39. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85.
  40. Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 235–243, Suntec, Singapore. Association for Computational Linguistics.
  41. Shall we pretrain autoregressive language models with retrieval? a comprehensive study. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7763–7786, Singapore. Association for Computational Linguistics.
  42. Hierarchical pretraining on multimodal electronic health records. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2839–2852, Singapore. Association for Computational Linguistics.
  43. Augmenting black-box llms with medical textbooks for clinical question answering.
  44. Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic acids research, 36(suppl_1):D901–D906.
  45. Benchmarking retrieval-augmented generation for medicine.
  46. Weakly-supervised scientific document classification via retrieval-augmented multi-stage training. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 2501–2505, New York, NY, USA. Association for Computing Machinery.
  47. Counterfactual and factual reasoning over hypergraphs for interpretable clinical predictions on ehr. In Proceedings of the 2nd Machine Learning for Health symposium, volume 193 of Proceedings of Machine Learning Research, pages 259–278. PMLR.
  48. Seqcare: Sequential training with external medical knowledge graph for diagnosis prediction in healthcare data. In Proceedings of the ACM Web Conference 2023, pages 2819–2830.
  49. Medretriever: Target-driven interpretable health risk prediction via retrieving unstructured medical text. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 2414–2423.
  50. Chain-of-note: Enhancing robustness in retrieval-augmented language models.
  51. Retrieval augmentation for commonsense reasoning: A unified approach. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4364–4377, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  52. COCO-DR: Combating distribution shift in zero-shot dense retrieval with contrastive and distributionally robust learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1462–1479, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  53. Almanac—retrieval-augmented language models for clinical medicine. NEJM AI, 1(2):AIoa2300068.
  54. Knowledge-rich self-supervision for biomedical entity linking. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 868–880, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  55. Benchmarking large language models for news summarization. Transactions of the Association for Computational Linguistics, 12:39–57.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.