Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Data Sharing Paradox of Synthetic Data in Healthcare (2503.20847v1)

Published 26 Mar 2025 in cs.DB, cs.CR, and cs.CY

Abstract: Synthetic data offers a promising solution to privacy concerns in healthcare by generating useful datasets in a privacy-aware manner. However, although synthetic data is typically developed with the intention of sharing said data, ambiguous reidentification risk assessments often prevent synthetic data from seeing the light of day. One of the main causes is that privacy metrics for synthetic data, which inform on reidentification risks, are not well-aligned with practical requirements and regulations regarding data sharing in healthcare. This article discusses the paradoxical situation where synthetic data is designed for data sharing but is often still restricted. We also discuss how the field should move forward to mitigate this issue.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jim Achterberg (2 papers)
  2. Bram van Dijk (5 papers)
  3. Saif ul Islam (6 papers)
  4. Hafiz Muhammad Waseem (2 papers)
  5. Parisis Gallos (2 papers)
  6. Gregory Epiphaniou (17 papers)
  7. Carsten Maple (65 papers)
  8. Marcel Haas (4 papers)
  9. Marco Spruit (11 papers)