Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision (2404.06723v1)

Published 10 Apr 2024 in cs.LG and cs.CL

Abstract: Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsity, varied recording frequencies, and temporal irregularities. To this end, this paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes. To tackle the challenge of sparsity and irregular time intervals in medical time series, the framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations. To harness the interconnected relationships between medical time series and clinical notes, the framework equips a global contrastive loss, aligning a patient's multimodal feature representations with the corresponding discharge summaries. Since discharge summaries uniquely pertain to individual patients and represent a holistic view of the patient's hospital stay, machine learning models are led to learn discriminative multimodal features via global contrasting. Extensive experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications for more than 120,000 major inpatient surgeries using multimodal data from UF health system split among three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health Jacksonville-North).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Mining electronic health records (ehrs) a survey. ACM Computing Surveys (CSUR), 50(6):1–40, 2018.
  2. Effect of a machine learning–derived early warning system for intraoperative hypotension vs standard care on depth and duration of intraoperative hypotension during elective noncardiac surgery: the hype randomized clinical trial. Jama, 323(11):1052–1060, 2020.
  3. A multivariate timeseries modeling approach to severity of illness assessment and forecasting in icu with sparse, heterogeneous clinical data. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
  4. Long Short-Term Memory. Long short-term memory. Neural computation, 9(8):1735–1780, 2010.
  5. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
  6. Deepsofa: a continuous acuity score for critically ill patients using clinically interpretable deep learning. Scientific reports, 9(1):1879, 2019.
  7. Behrt: transformer for electronic health records. Scientific reports, 10(1):7155, 2020.
  8. Multi-dimensional patient acuity estimation with longitudinal ehr tokenization and flexible transformer networks. Frontiers in Digital Health, 4:1029191, 2022.
  9. Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(6):1–17, 2022.
  10. Building an automated, machine learning-enabled platform for predicting post-operative complications. Physiological Measurement, 44(2):024001, 2023.
  11. Hi-behrt: hierarchical transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records. IEEE journal of biomedical and health informatics, 27(2):1106–1117, 2022.
  12. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2022.
  13. Opportunities and challenges in leveraging electronic health record data in oncology. Future oncology, 12(10):1261–1274, 2016.
  14. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022.
  15. Multimodal pretraining of medical time series and notes. In Machine Learning for Health (ML4H), pages 244–255. PMLR, 2023.
  16. Bo Yang and Lijun Wu. How to leverage multimodal ehr data for better medical predictions? arXiv preprint arXiv:2110.15763, 2021.
  17. Big data analytics in healthcare: promise and potential. Health information science and systems, 2:1–10, 2014.
  18. The dawn of multimodal artificial intelligence in nephrology. Nature Reviews Nephrology, pages 1–2, 2023.
  19. Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265, 2023.
  20. High-modality multimodal transformer: Quantifying modality & interaction heterogeneity for high-modality representation learning. Transactions on Machine Learning Research, 2022.
  21. Mufasa: Multimodal fusion architecture search for electronic health records. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10532–10540, 2021.
  22. Detecting impasse during collaborative problem solving with multimodal learning analytics. In LAK22: 12th International Learning Analytics and Knowledge Conference, pages 45–55, 2022.
  23. Multimodal learning with transformers: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  24. Transmodality: An end2end fusion method with transformer for multimodal sentiment analysis. In Proceedings of The Web Conference 2020, pages 2514–2520, 2020.
  25. Transformers in medical image segmentation: A review. Biomedical Signal Processing and Control, 84:104791, 2023.
  26. How noisy is too noisy? the impact of data noise on multimodal recognition of confusion and conflict during collaborative learning. In Proceedings of the 25th International Conference on Multimodal Interaction, pages 326–335, 2023.
  27. Temporal cross-attention for dynamic embedding and tokenization of multimodal electronic health records. In ICLR 2024 Workshop on Learning from Time Series For Health, 2024.
  28. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.
  29. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015.
  30. Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems, 29, 2016.
  31. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  32. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821, 2021.
  33. Contrastive representation learning: A framework and review. Ieee Access, 8:193907–193934, 2020.
  34. Contrastive learning of global and local features for medical image segmentation with limited annotations. Advances in neural information processing systems, 33:12546–12558, 2020.
  35. Big self-supervised models advance medical image classification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3478–3488, 2021.
  36. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  37. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pages 2–25. PMLR, 2022.
  38. Heejoon Koo. Next visit diagnosis prediction via medical code-centric multimodal contrastive ehr modelling with hierarchical regularisation. arXiv preprint arXiv:2401.11648, 2024.
  39. Contig: Self-supervised multimodal contrastive learning for medical imaging with genetics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20908–20921, 2022.
  40. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.
  41. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297, 2020.
  42. Position-aware self-attention based neural sequence labeling. Pattern Recognition, 110:107636, 2021.
  43. Time2vec: Learning a vector representation of time. arXiv preprint arXiv:1907.05321, 2019.
  44. Learn from relational correlations and periodic events for temporal knowledge graph reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1559–1568, 2023.
  45. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
  46. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
  47. Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323, 2019.
  48. Clinical-longformer and clinical-bigbird: Transformers for long clinical sequences. arXiv preprint arXiv:2201.11838, 2022.
  49. A large language model for electronic health records. NPJ digital medicine, 5(1):194, 2022.
  50. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for computational linguistics. Meeting, volume 2019, page 6558. NIH Public Access, 2019.
  51. Unify, align and refine: Multi-level semantic alignment for radiology report generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2863–2874, 2023.
  52. Foundations and trends in multimodal machine learning: Principles, challenges, and open questions. arXiv preprint arXiv:2209.03430, 2022.
  53. Learning from the global view: Supervised contrastive learning of multimodal representation. Information Fusion, 100:101920, 2023.
  54. Multimodal fusion of ehr in structures and semantics: Integrating clinical records and notes with hypergraph and llm. arXiv preprint arXiv:2403.08818, 2024.
  55. Automated generation of hospital discharge summaries using clinical guidelines and large language models. In AAAI 2024 Spring Symposium on Clinical Foundation Models, 2024.
  56. Inferring multimodal latent topics from electronic health records. Nature communications, 11(1):2536, 2020.
  57. Performance of a machine learning algorithm using electronic health record data to predict postoperative complications and report on a mobile platform. JAMA Network Open, 5(5):e2211973–e2211973, 2022.
  58. Biomedbert: A pre-trained biomedical language model for qa and ir. In Proceedings of the 28th international conference on computational linguistics, pages 669–679, 2020.
  59. 1d convolutional neural networks and applications: A survey. Mechanical systems and signal processing, 151:107398, 2021.
  60. Data-gru: Dual-attention time-aware gated recurrent unit for irregular multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 930–937, 2020.
  61. Deep dynamic imputation of clinical time series for mortality prediction. Information Sciences, 579:607–622, 2021.
  62. Utilizing imbalanced electronic health records to predict acute kidney injury by ensemble learning and time series model. BMC Medical Informatics and Decision Making, 20(1):1–13, 2020.
  63. Dynamic sepsis prediction for intensive care unit patients using xgboost-based model with novel time-dependent features. IEEE Journal of Biomedical and Health Informatics, 26(8):4258–4269, 2022.
  64. Dynamic predictions of postoperative complications from explainable, uncertainty-aware, and multi-task deep neural networks. Scientific Reports, 13(1):1224, 2023.
  65. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  66. Improving black-box robustness with in-context rewriting. arXiv e-prints, pages arXiv–2402, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Yingbo Ma (20 papers)
  2. Suraj Kolla (3 papers)
  3. Zhenhong Hu (9 papers)
  4. Dhruv Kaliraman (2 papers)
  5. Victoria Nolan (2 papers)
  6. Ziyuan Guan (20 papers)
  7. Yuanfang Ren (24 papers)
  8. Brooke Armfield (5 papers)
  9. Tezcan Ozrazgat-Baslanti (32 papers)
  10. Jeremy A. Balch (2 papers)
  11. Tyler J. Loftus (15 papers)
  12. Parisa Rashidi (59 papers)
  13. Azra Bihorac (51 papers)
  14. Benjamin Shickel (24 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets