Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
Gemini 2.5 Pro Premium
42 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
27 tokens/sec
GPT-4o
100 tokens/sec
DeepSeek R1 via Azure Premium
86 tokens/sec
GPT OSS 120B via Groq Premium
464 tokens/sec
Kimi K2 via Groq Premium
181 tokens/sec
2000 character limit reached

Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIV (2208.08655v2)

Published 18 Aug 2022 in cs.LG

Abstract: Clinical data usually cannot be freely distributed due to their highly confidential nature and this hampers the development of machine learning in the healthcare domain. One way to mitigate this problem is by generating realistic synthetic datasets using generative adversarial networks (GANs). However, GANs are known to suffer from mode collapse thus creating outputs of low diversity. This lowers the quality of the synthetic healthcare data, and may cause it to omit patients of minority demographics or neglect less common clinical practices. In this paper, we extend the classic GAN setup with an additional variational autoencoder (VAE) and include an external memory to replay latent features observed from the real samples to the GAN generator. Using antiretroviral therapy for human immunodeficiency virus (ART for HIV) as a case study, we show that our extended setup overcomes mode collapse and generates a synthetic dataset that accurately describes severely imbalanced class distributions commonly found in real-world clinical variables. In addition, we demonstrate that our synthetic dataset is associated with a very low patient disclosure risk, and that it retains a high level of utility from the ground truth dataset to support the development of downstream machine learning algorithms.

Citations (18)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.