DP-CGAN: Differentially Private Synthetic Data and Label Generation (2001.09700v1)

Published 27 Jan 2020 in cs.LG and stat.ML

Abstract: Generative Adversarial Networks (GANs) are one of the well-known models to generate synthetic data including images, especially for research communities that cannot use original sensitive datasets because they are not publicly accessible. One of the main challenges in this area is to preserve the privacy of individuals who participate in the training of the GAN models. To address this challenge, we introduce a Differentially Private Conditional GAN (DP-CGAN) training framework based on a new clipping and perturbation strategy, which improves the performance of the model while preserving privacy of the training dataset. DP-CGAN generates both synthetic data and corresponding labels and leverages the recently introduced Renyi differential privacy accountant to track the spent privacy budget. The experimental results show that DP-CGAN can generate visually and empirically promising results on the MNIST dataset with a single-digit epsilon parameter in differential privacy.

Authors (3)

Reihaneh Torkzadehmahani (9 papers)
Peter Kairouz (75 papers)
Benedict Paten (10 papers)

Citations (218)

View on Semantic Scholar

Summary

Overview of DP-CGAN: Differentially Private Synthetic Data and Label Generation

The paper "DP-CGAN: Differentially Private Synthetic Data and Label Generation" introduces a novel framework for training Generative Adversarial Networks (GANs) with differential privacy, ensuring that the privacy of individuals in the training datasets is preserved. This framework addresses a significant gap in previous research, where GANs were primarily used to generate synthetic data without corresponding labels—a limitation for applications requiring labeled datasets.

Motivation and Core Contribution

GANs are a powerful tool for generating synthetic data but often face criticisms related to privacy. Standard GAN models are susceptible to attacks such as model inversion and membership inference, which can potentially leak sensitive training data. This issue is particularly pronounced when handling data with stringent privacy concerns, such as medical or financial datasets.

The contribution of this work is twofold:

Improved Gradient Clipping Mechanism: The authors propose a new clipping and perturbation strategy where the gradients of the discriminator loss for real and fake data are clipped separately, allowing better control over the sensitivity of the model to real, sensitive data.
R\'enyi Differential Privacy (RDP) Accountant: By utilizing this new privacy accountant, the framework allows for a more accurate tracking of the privacy budget compared to traditional approaches like the Moment Accountant, enabling less noise addition for privacy preservation while maintaining model utility.

Experimental Evaluation

The authors conduct experiments on the MNIST dataset to evaluate the empirical performance of their DP-CGAN framework. The results are notable for several reasons:

Visual Quality: The synthetic images and labels generated maintain high visual fidelity while adhering to strong differential privacy guarantees.
Numerical Performance: The framework achieves an AUROC of 87.57% for classifiers trained on the synthetic data, compared to 92.17% when the classifier is directly trained on real data. This demonstrates a reasonable trade-off between privacy and utility.

Implications and Future Directions

The implications of this research extend to several domains where privacy-preserving synthetic data is crucial. These include healthcare, where patient data sensitivity is paramount, and finance, where transactional data must be protected.

Theoretical advancements such as those presented in DP-CGAN suggest potential improvements in the training of GANs with stronger privacy guarantees and enhanced utility of generated data. Future developments may focus on adapting this framework to more complex datasets beyond MNIST, such as image datasets like CIFAR-10 or even high-resolution datasets like CelebA, broadening the applicability of differentially private GANs.

In conclusion, this work lays the groundwork for further exploration in differentially private generative models, offering significant improvements in both methodology and results when benchmarked against prior approaches.

PDF Markdown