Differentially Private Latent Diffusion Models (2305.15759v5)
Abstract: Diffusion models (DMs) are one of the most widely used generative models for producing high quality images. However, a flurry of papers points out that DMs are least private forms of image generators, by extracting a significant number of near-identical replicas of training images from DMs. Existing privacy-enhancing techniques for DMs, unfortunately, do not provide a good privacy-utility tradeoff. In this paper, we aim to improve the current state of DMs with differential privacy (DP) by adopting the \textit{Latent} Diffusion Models (LDMs). LDMs are equipped with powerful pre-trained autoencoders that map the high-dimensional pixels into lower-dimensional latent representations, in which DMs are trained, yielding a more efficient and fast training of DMs. Rather than fine-tuning the entire LDMs, we fine-tune only the $\textit{attention}$ modules of LDMs with DP-SGD, reducing the number of trainable parameters by roughly $90\%$ and achieving a better privacy-accuracy trade-off. Our approach allows us to generate realistic, high-dimensional images (256x256) conditioned on text prompts with DP guarantees, which, to the best of our knowledge, has not been attempted before. Our approach provides a promising direction for training more powerful, yet training-efficient differentially private DMs, producing high-quality DP images. Our code is available at https://anonymous.4open.science/r/DP-LDM-4525.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, pp. 308–318, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341394. doi: 10.1145/2976749.2978318.
- Differentially private mixture of generative neural networks. IEEE Transactions on Knowledge and Data Engineering, 31(6):1109–1121, 2018.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2020.
- Private GANs, revisited. In NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, 2022.
- Scalable and efficient training of large convolutional neural networks with differential privacy, 2022.
- Don’t generate me: Training differentially private generative models with sinkhorn divergence. In Neural Information Processing Systems (NeurIPS), 2021.
- Extracting training data from diffusion models. In Proceedings of the 32nd USENIX Conference on Security Symposium, SEC ’23, USA, 2023. USENIX Association. ISBN 978-1-939133-37-3.
- Gs-wgan: A gradient-sanitized approach for learning differentially private generators. In Advances in Neural Information Processing Systems 33, 2020.
- Differentially private high-dimensional data publication via sampling-based inference. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 129–138, 2015.
- Emnist: Extending mnist to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pp. 2921–2926. IEEE, 2017.
- Unlocking high-accuracy differentially private image classification through scale, 2022.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Differentially private diffusion models, 2023. URL https://openreview.net/forum?id=pX21pH4CsNB.
- Are diffusion models vulnerable to membership inference attacks? In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
- Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, volume 4004 of Lecture Notes in Computer Science, pp. 486–503. Springer, 2006. doi: 10.1007/11761679˙29.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Differentially private generative adversarial networks for time series, continuous, and discrete open data. In ICT Systems Security and Privacy Protection - 34th IFIP TC 11 International Conference, SEC 2019, Lisbon, Portugal, June 25-27, 2019, Proceedings, pp. 151–164, 2019. doi: 10.1007/978-3-030-22312-0_11.
- Differentially private diffusion models generate useful synthetic images, 2023.
- Generative adversarial networks. In Advances in Neural Information Processing Systems, 2014.
- DP-MERF: Differentially private mean embeddings with random features for practical privacy-preserving data generation. In AISTATS, volume 130 of Proceedings of Machine Learning Research, pp. 1819–1827. PMLR, 2021.
- Pre-trained perceptual features improve differentially private image generation. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=R6W7zkMz0P.
- A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems 25, pp. 2339–2347. Curran Associates, Inc., 2012.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Prompt-to-prompt image editing with cross attention control, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 6840–6851. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf.
- Lora: Low-rank adaptation of large language models, 2021.
- Membership inference of diffusion models, 2023.
- Sok: Privacy-preserving data synthesis. In 2024 IEEE Symposium on Security and Privacy (SP), pp. 2–2. IEEE Computer Society, 2023.
- Dp22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-vae: Differentially private pre-trained variational autoencoders. arXiv preprint arXiv:2208.03409, 2022.
- Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Hk99zCeAb.
- Wilds: A benchmark of in-the-wild distribution shifts, 2021.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, ON, Canada, 2009.
- MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/, 2010. URL http://yann.lecun.com/exdb/mnist/.
- Deep learning. nature, 521(7553):436–444, 2015.
- Large language models can be strong differentially private learners. In International Conference on Learning Representations, 2022.
- PEARL: Data synthesis via private embeddings and adversarial reconstruction learning. In International Conference on Learning Representations, 2022a.
- PEARL: Data synthesis via private embeddings and adversarial reconstruction learning. In International Conference on Learning Representations, 2022b.
- Differentially private synthetic data via foundation model apis 1: Images, 2023.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Membership inference attacks against diffusion models. In 2023 IEEE Security and Privacy Workshops (SPW), pp. 77–83, 2023. doi: 10.1109/SPW59333.2023.00013.
- Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, pp. 493–501, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0813-7. doi: 10.1145/2020408.2020487.
- Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
- Shape-guided diffusion with inside-outside attention, 2023.
- Dp-em: Differentially private expectation maximization. In Artificial Intelligence and Statistics, pp. 896–904. PMLR, 2017.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Dpd-fvae: Synthetic data generation using federated variational autoencoders with differentially-private decoder. arXiv preprint arXiv:2211.11591, 2022.
- How to dp-fy ml: A practical tutorial to machine learning with differential privacy. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pp. 5823–5824, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599561. URL https://doi.org/10.1145/3580305.3599561.
- Priview: practical differentially private release of marginal contingency tables. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 1435–1446, 2014.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695, June 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
- Toast: Transfer learning via attention steering, 2023.
- pmse mechanism: differentially private synthetic data with maximal distributional similarity. In International Conference on Privacy in Statistical Databases, pp. 138–159. Springer, 2018.
- Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6048–6058, June 2023.
- Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations, 2021.
- Membership inference attacks on diffusion models via quantile regression, 2023.
- Dp-cgan: Differentially private synthetic data and label generation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019.
- Considerations for differentially private learning with large-scale public pretraining. arXiv preprint arXiv:2212.06470, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Hermite polynomial features for private data generation. In ICML, volume 162 of Proceedings of Machine Learning Research, pp. 22300–22324. PMLR, 2022.
- Membership inference attacks against text-to-image generation models, 2023. URL https://openreview.net/forum?id=J41IW8Z7mE.
- Tedigan: Text-guided diverse face image generation and manipulation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2256–2265, 2021.
- Differentially private data release through multidimensional partitioning. In Secure Data Management, pp. 150–168, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. ISBN 978-3-642-15546-8.
- Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739, 2018.
- PATE-GAN: Generating synthetic data with differential privacy guarantees. In International Conference on Learning Representations, 2019.
- Transferring pretrained diffusion probabilistic models, 2023. URL https://openreview.net/forum?id=8u9eXwu5GAb.
- Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298, 2021.
- Differentially private fine-tuning of language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Q42f0dfjECO.
- Wide residual networks, 2017.
- Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):1–41, 2017.
- Adding conditional control to text-to-image diffusion models, 2023a.
- Tell your model where to attend: Post-hoc attention steering for llms, 2023b.
- Privsyn: Differentially private data synthesis. In 30th USENIX Security Symposium (USENIX Security 21), 2021.
- Differentially private data publishing and analysis: A survey. IEEE Transactions on Knowledge and Data Engineering, 29(8):1619–1638, August 2017. ISSN 1041-4347. doi: 10.1109/TKDE.2017.2697856.
- Saiyue Lyu (2 papers)
- Michael F. Liu (1 paper)
- Margarita Vinaroz (5 papers)
- Mijung Park (28 papers)