Differentially Private Diffusion Models (2210.09929v3)
Abstract: While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge, providing access to synthetic data instead. We build on the recent success of diffusion models (DMs) and introduce Differentially Private Diffusion Models (DPDMs), which enforce privacy using differentially private stochastic gradient descent (DP-SGD). We investigate the DM parameterization and the sampling algorithm, which turn out to be crucial ingredients in DPDMs, and propose noise multiplicity, a powerful modification of DP-SGD tailored to the training of DMs. We validate our novel DPDMs on image generation benchmarks and achieve state-of-the-art performance in all experiments. Moreover, on standard benchmarks, classifiers trained on DPDM-generated synthetic data perform on par with task-specific DP-SGD-trained classifiers, which has not been demonstrated before for DP generative models. Project page and code: https://nv-tlabs.github.io/DPDM.
- Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318, 2016.
- Differentially Private Mixture of Generative Neural Networks. IEEE Transactions on Knowledge and Data Engineering, 31(6):1109–1121, 2018.
- Large-Scale Differentially Private BERT. arXiv:2108.01624, 2021.
- Towards Principled Methods for Training Generative Adversarial Networks. In International Conference on Learning Representations, 2017.
- J. Bailey. The tools of generative art, from flash to neural networks. Art in America, 2020.
- eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324, 2022.
- A Differentially Private Probabilistic Framework for Modeling the Variability Across Federated Datasets of Heterogeneous Multi-View Observations. Journal of Machine Learning for Biomedical Imaging, 2022.
- Private GANs, Revisited. In NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, 2022.
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations, 2019.
- Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy. Advances in Neural Information Processing Systems, 35:38305–38318, 2022.
- Don’t Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence. Advances in Neural Information Processing Systems, 34:12480–12492, 2021.
- Extracting Training Data from Large Language Models. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2633–2650, 2021.
- GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators. Advances in Neural Information Processing Systems, 33:12673–12684, 2020.
- DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8387–8396, June 2022.
- WaveGrad: Estimating Gradients for Waveform Generation. In International Conference on Learning Representations, 2021.
- Unlocking High-Accuracy Differentially Private Image Classification through Scale. arXiv:2204.13650, 2022.
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Diffusion Models Beat GANs on Image Synthesis. In Neural Information Processing Systems, 2021.
- GENIE: Higher-Order Denoising Diffusion Solvers. In Advances in Neural Information Processing Systems, 2022a.
- Score-Based Generative Modeling with Critically-Damped Langevin Diffusion. In International Conference on Learning Representations, 2022b.
- Not All Noise is Accounted Equally: How Differentially Private Learning Benefits from Large Sampling Rates. In 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE, 2021.
- Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography Conference, pp. 265–284. Springer, 2006.
- The Algorithmic Foundations of Differential Privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Differentially Private Generative Adversarial Networks for Time Series, Continuous, and Discrete Open Data. In IFIP International Conference on ICT Systems Security and Privacy Protection, pp. 151–164. Springer, 2019.
- Numerical Composition of Differential Privacy. Advances in Neural Information Processing Systems, 34:11631–11642, 2021.
- DP-MERF: Differentially Private Mean Embeddings with RandomFeatures for Practical Privacy-preserving Data Generation. In International Conference on Artificial Intelligence and Statistics, pp. 1819–1827. PMLR, 2021.
- Differentially Private Data Generation Needs Better Features. arXiv:2205.12900, 2022.
- GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017.
- Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, 2020.
- Cascaded Diffusion Models for High Fidelity Image Generation. arXiv:2106.15282, 2021.
- Imagen Video: High Definition Video Generation with Diffusion Models. arXiv preprint arXiv:2210.02303, 2022a.
- Video Diffusion Models. arXiv:2204.03458, 2022b.
- Aapo Hyvärinen. Estimation of Non-Normalized Statistical Models by Score Matching. Journal of Machine Learning Research, 6:695–709, 2005. ISSN 1532-4435.
- Scalable Adaptive Computation for Iterative Generation. arXiv:2212.1197, 2023.
- Diff-TTS: A Denoising Diffusion Model for Text-to-Speech. arXiv preprint arXiv:2104.01409, 2021.
- Gotta Go Fast When Generating Data with Score-Based Models. arXiv:2105.14080, 2021.
- Distribution Augmentation for Generative Modeling. In International Conference on Machine Learning, pp. 5006–5019. PMLR, 2020.
- Training Generative Adversarial Networks with Limited Data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020a.
- Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Bision and Pattern Recognition, pp. 8110–8119, 2020b.
- Alias-Free Generative Adversarial Networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
- Elucidating the Design Space of Diffusion-Based Generative Models. arXiv:2206.00364, 2022.
- Denoising Diffusion Restoration Models. arXiv:2201.11793, 2022.
- NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, 2015.
- Variational Diffusion Models. In Advances in Neural Information Processing Systems, 2021.
- DiffWave: A Versatile Diffusion Model for Audio Synthesis. In International Conference on Learning Representations, 2021.
- Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto, 2009.
- Toward Training at ImageNet Scale with Differential Privacy. arXiv:2201.12328, 2022.
- MNIST handwritten digit database, 2010.
- SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models. arXiv:2104.14951, 2021.
- When Does Differentially Private Learning Not Suffer in High Dimensions? Advances in Neural Information Processing Systems, 35:28616–28630, 2022a.
- Large Language Models Can Be Strong Differentially Private Learners. In International Conference on Learning Representations, 2022b.
- PEARL: Data Synthesis via Private Embeddings and Adversarial Reconstruction Learning. In International Conference on Learning Representations, 2022.
- Pseudo Numerical Methods for Diffusion Models on Manifolds. In International Conference on Learning Representations, 2022.
- Deep Learning Face Attributes in the Wild. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738, 2015.
- Scalable differentially private data generation via private aggregation of teacher ensembles. 2019.
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. arXiv:2206.00927, 2022.
- Diffusion Probabilistic Models for 3D Point Cloud Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- A General Approach to Adding Differential Privacy to Iterative Training Procedures. arXiv:1812.06210, 2018.
- SDEdit: Image Synthesis and Editing with Stochastic Differential Equations. arXiv:2108.01073, 2021.
- Which Training Methods for GANs do actually Converge? In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 2018.
- Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275. IEEE, 2017.
- Rényi Differential Privacy of the Sampled Gaussian Mechanism. arXiv:1908.10530, 2019.
- The Creation and Detection of Deepfakes: A Survey. ACM Comput. Surv., 54(1), 2021.
- Deep Learning for Deepfakes Creation and Detection: A Survey. arXiv:1909.11573, 2021.
- Improved Denoising Diffusion Probabilistic Models. In International Conference on Machine Learning, 2021.
- Art B. Owen. Monte Carlo theory, methods and examples. 2013.
- Hyperparameter Tuning with Renyi Differential Privacy. In International Conference on Learning Representations, 2022.
- Scalable Private Learning with PATE. In International Conference on Learning Representations, 2018.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, 32, 2019.
- Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv:2204.06125, 2022.
- High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752, 2021.
- Palette: Image-to-Image Diffusion Models. arXiv:2111.05826, 2021a.
- Image Super-Resolution via Iterative Refinement. arXiv:2104.07636, 2021b.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv:2205.11487, 2022.
- Progressive Distillation for Fast Sampling of Diffusion Models. In International Conference on Learning Representations, 2022.
- Anand Sarwate. Retraction for Symmetric Matrix Perturbation for Differentially-Private Principal Component Analysis, 2017.
- UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models. arXiv:2104.05358, 2021.
- Make-A-Video: Text-to-Video Generation without Text-Video Data. In The Eleventh International Conference on Learning Representations, 2023.
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In International Conference on Machine Learning, 2015.
- Denoising Diffusion Implicit Models. In International Conference on Learning Representations, 2021a.
- Improved Techniques for Training Score-Based Generative Models. Advances in Neural Information Processing Systems, 33:12438–12448, 2020.
- Maximum Likelihood Training of Score-Based Diffusion Models. In Neural Information Processing Systems (NeurIPS), 2021b.
- Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations, 2021c.
- P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 169–180. IEEE, 2021.
- DP-CGAN: Differentially Private Synthetic Data and Label Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.
- Differentially Private Learning Needs Better Features (or Much More Data). In International Conference on Learning Representations, 2021.
- Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News. Social Media + Society, 6(1):2056305120903408, 2020.
- Score-based Generative Modeling in Latent Space. In Neural Information Processing Systems (NeurIPS), 2021.
- Hermite Polynomial Features for Private Data Generation. In International Conference on Machine Learning, pp. 22300–22324. PMLR, 2022.
- DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 2146–2168, 2021.
- Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality. In International Conference on Learning Representations, 2022.
- This Person (Probably) Exists. Identity Membership Attacks Against GAN Generated Faces. arXiv:2107.06018, 2021.
- Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:1708.07747, 2017.
- Tackling the Generative Learning Trilemma with Denoising Diffusion GANs. In International Conference on Learning Representations, 2022.
- Differentially Private Generative Adversarial Network. arXiv:1802.06739, 2018.
- The Unusual Effectiveness of Averaging in GAN Training. In International Conference on Learning Representations, 2019.
- See Through Gradients: Image Batch Recovery via GradInversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16337–16346, 2021.
- PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In International Conference on Learning Representations, 2019.
- Opacus: User-Friendly Differential Privacy Library in PyTorch. In NeurIPS 2021 Workshop Privacy in Machine Learning, 2021.
- Differentially Private Fine-tuning of Language Models. In International Conference on Learning Representations, 2022.
- A neural database for differentially private spatial range queries. Proceedings of the VLDB Endowment, 15(5):1066–1078, 2022.
- LION: Latent Point Diffusion Models for 3D Shape Generation. In Advances in Neural Information Processing Systems, 2022.
- Fast Sampling of Diffusion Models with Exponential Integrator. arXiv:2204.13902, 2022.
- 3D Shape Generation and Completion through Point-Voxel Diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Medical imaging deep learning with differential privacy. Scientific Reports, 11(1):1–8, 2021a.
- Differentially private federated deep learning for multi-site medical image segmentation. arXiv:2107.02586, 2021b.
- Tim Dockhorn (13 papers)
- Tianshi Cao (11 papers)
- Arash Vahdat (69 papers)
- Karsten Kreis (50 papers)