Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models (1905.07002v2)

Published 16 May 2019 in cs.CL

Abstract: Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural LLMs yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Oren Melamud (5 papers)
  2. Chaitanya Shivade (11 papers)
Citations (32)