Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines (2402.04400v2)

Published 6 Feb 2024 in cs.LG, cs.AI, and cs.CY

Abstract: Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Chao Pang (23 papers)
  2. Xinzhuo Jiang (3 papers)
  3. Nishanth Parameshwar Pavinkurve (1 paper)
  4. Krishna S. Kalluri (2 papers)
  5. Elise L. Minto (1 paper)
  6. Jason Patterson (1 paper)
  7. Linying Zhang (7 papers)
  8. George Hripcsak (21 papers)
  9. Noémie Elhadad (28 papers)
  10. Karthik Natarajan (18 papers)
  11. Gamze Gürsoy (3 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com