Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records (2402.14042v2)

Published 21 Feb 2024 in cs.LG, cs.AI, and cs.CR

Abstract: Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data generation has gradually replaced anonymization since it is relatively less time and resource-consuming and more robust to data leakage. Generative Adversarial Networks (GANs) have been used for generating synthetic datasets, especially GAN frameworks adhering to the differential privacy phenomena. This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients which can be distributed without privacy concerns. Predictive modeling, autocorrelation, and distribution analysis are used to assess the Quality of Generating (QoG) of the generated data. The privacy preservation of the respective models is assessed by applying membership inference attacks to determine potential data leakage risks. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation while maintaining an acceptable level of QoG. The presented results can support better data protection for medical use cases in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. A. Torfi, E. A. Fox, and C. K. Reddy, “Differentially private synthetic medical data generation using convolutional gans,” Information Sciences, vol. 586, pp. 485–500, 2022.
  2. J. Guan, R. Li, S. Yu, and X. Zhang, “Generation of synthetic electronic medical record text,” pp. 374–380, 2018.
  3. A. Goncalves, P. Ray, B. Soper, J. Stevens, L. Coyle, and A. P. Sales, “Generation and evaluation of synthetic patient data,” BMC medical research methodology, vol. 20, no. 1, pp. 1–40, 2020.
  4. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” vol. 33, pp. 6840–6851, 2020.
  5. S. Wang, C. Rudolph, S. Nepal, M. Grobler, and S. Chen, “Part-gan: Privacy-preserving time-series sharing,” pp. 578–593, 2020.
  6. I. E. Olatunji, J. Rauch, M. Katzensteiner, and M. Khosla, “A review of anonymization for healthcare data,” Big Data, 2022.
  7. A. L. Buczak, S. Babin, and L. Moniz, “Data-driven approach for creating synthetic electronic medical records,” BMC medical informatics and decision making, vol. 10, no. 1, pp. 1–28, 2010.
  8. E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, “Generating multi-label discrete patient records using generative adversarial networks,” pp. 286–305, 2017.
  9. J. Jordon, J. Yoon, and M. Van Der Schaar, “Pate-gan: Generating synthetic data with differential privacy guarantees,” 2018.
  10. J. Yoon, L. N. Drumright, and M. Van Der Schaar, “Anonymization through data synthesis using generative adversarial networks (ads-gan),” IEEE journal of biomedical and health informatics, vol. 24, no. 8, pp. 2378–2388, 2020.
  11. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, vol. 30, 2017.
  12. Z. Lin, A. Jain, C. Wang, G. Fanti, and V. Sekar, “Using gans for sharing networked time series data: Challenges, initial promise, and open questions,” pp. 464–483, 2020.
  13. J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao, “Privbayes: Private data release via bayesian networks,” ACM Transactions on Database Systems (TODS), vol. 42, no. 4, pp. 1–41, 2017.
  14. K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empirical risk minimization.” Journal of Machine Learning Research, vol. 12, no. 3, 2011.
  15. M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” pp. 308–318, 2016.
  16. L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou, “Differentially private generative adversarial network,” arXiv preprint arXiv:1802.06739, 2018.
  17. Y. Liu, J. Peng, J. James, and Y. Wu, “Ppgan: Privacy-preserving generative adversarial network,” pp. 985–989, 2019.
  18. D. Chen, N. Yu, Y. Zhang, and M. Fritz, “Gan-leaks: A taxonomy of membership inference attacks against generative models,” pp. 343–362, 2020.
  19. R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” pp. 3–18, 2017.
  20. A. Galen, C. Steve, and N. Papernot, “MS Windows NT kernel description,” 2019.
  21. J. Cha, J.-N. Voigt-Antons, C. Trahms, J. L. O’Sullivan, P. Gellert, A. Kuhlmey, S. Möller, and J. Nordheim, “Finding critical features for predicting quality of life in tablet-based serious games for dementia,” Quality and User Experience, vol. 4, no. 1, pp. 1–20, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Navid Ashrafi (7 papers)
  2. Vera Schmitt (8 papers)
  3. Robert P. Spang (4 papers)
  4. Sebastian Möller (77 papers)
  5. Jan-Niklas Voigt-Antons (28 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets