Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synthesizing Mixed-type Electronic Health Records using Diffusion Models (2302.14679v2)

Published 28 Feb 2023 in cs.LG and cs.CL

Abstract: Electronic Health Records (EHRs) contain sensitive patient information, which presents privacy concerns when sharing such data. Synthetic data generation is a promising solution to mitigate these risks, often relying on deep generative models such as Generative Adversarial Networks (GANs). However, recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound. In this work, we investigate the potential of diffusion models for generating realistic mixed-type tabular EHRs, comparing TabDDPM model with existing methods on four datasets in terms of data quality, utility, privacy, and augmentation. Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which confirms the trade-off between privacy and utility.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Taha Ceritli (8 papers)
  2. Ghadeer O. Ghosheh (5 papers)
  3. Vinod Kumar Chauhan (18 papers)
  4. Tingting Zhu (46 papers)
  5. Andrew P. Creagh (4 papers)
  6. David A. Clifton (54 papers)
Citations (14)