Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-training Differentially Private Models with Limited Public Data (2402.18752v2)

Published 28 Feb 2024 in cs.LG and cs.CR

Abstract: The superior performance of large foundation models relies on the use of massive amounts of high-quality data, which often contain sensitive, private and copyrighted material that requires formal protection. While differential privacy (DP) is a prominent method to gauge the degree of security provided to the models, its application is commonly limited to the model fine-tuning stage, due to the performance degradation when applying DP during the pre-training stage. Consequently, DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training process. In this work, we first provide a theoretical understanding of the efficacy of DP training by analyzing the per-iteration loss improvement. We make a key observation that DP optimizers' performance degradation can be significantly mitigated by the use of limited public data, which leads to a novel DP continual pre-training strategy. Empirically, using only 10\% of public data, our strategy can achieve DP accuracy of 41.5\% on ImageNet-21k (with $\epsilon=8$), as well as non-DP accuracy of 55.7\% and and 60.0\% on downstream tasks Places365 and iNaturalist-2021, respectively, on par with state-of-the-art standard pre-training and substantially outperforming existing DP pre-trained models. Our DP pre-trained models are released in fastDP library (https://github.com/awslabs/fast-differential-privacy/releases/tag/v2.1)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
  2. Public data-assisted mirror descent for private model training. In International Conference on Machine Learning, pages 517–535. PMLR, 2022.
  3. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473, 2014.
  4. signsgd: Compressed optimisation for non-convex problems. In International Conference on Machine Learning, pages 560–569. PMLR, 2018.
  5. Unlocking accuracy and fairness in differentially private image classification. arXiv preprint arXiv:2308.10888, 2023.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Deep learning with gaussian differential privacy. Harvard data science review, 2020(23), 2020.
  8. On the accuracy and efficiency of group-wise clipping in differentially private optimization. arXiv preprint arXiv:2310.19215, 2023.
  9. Scalable and efficient training of large convolutional neural networks with differential privacy. Advances in Neural Information Processing Systems, 35:38305–38318, 2022.
  10. On the convergence and calibration of deep learning with differential privacy. Transactions on Machine Learning Research, 2023.
  11. Automatic clipping: Differentially private deep learning made easier and stronger. arXiv preprint arXiv:2206.07136, 2022.
  12. Differentially private bias-term only fine-tuning of foundation models. In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, 2022.
  13. Differentially private optimization on large model at small cost. In International Conference on Machine Learning, pages 3192–3218. PMLR, 2023.
  14. Differentially private optimizers can learn adversarially robust models. Transactions on Machine Learning Research, 2023.
  15. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914. IEEE, 2022.
  16. Quantifying memorization across neural language models. In The Eleventh International Conference on Learning Representations, 2022.
  17. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021.
  18. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  19. Differential privacy protection against membership inference attack on machine learning for genomic data. In BIOCOMPUTING 2021: Proceedings of the Pacific Symposium, pages 26–37. World Scientific, 2020.
  20. Understanding gradient clipping in private sgd: A geometric perspective. Advances in Neural Information Processing Systems, 33:13773–13782, 2020.
  21. Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023.
  22. On the convergence of differentially private federated learning on non-lipschitz objectives, and with normalized client updates. arXiv preprint arXiv:2106.07094, 2021.
  23. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650, 2022.
  24. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  25. Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
  26. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  27. Self-training improves pre-training for natural language understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5408–5418, 2021.
  28. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
  29. Combining public and private data. arXiv preprint arXiv:2111.00115, 2021.
  30. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2020.
  31. Mixed differential privacy in computer vision. arXiv preprint arXiv:2203.11481, 2022.
  32. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  33. Numerical composition of differential privacy. Advances in Neural Information Processing Systems, 34, 2021.
  34. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
  35. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, 2020.
  36. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016.
  37. Exploring the limits of differentially private deep learning with group-wise clipping. arXiv preprint arXiv:2212.01539, 2022.
  38. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  39. Learning and evaluating a differentially private pre-trained language model. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1178–1189, 2021.
  40. Daniel Huynh. Starcoder memorization experiment highlights privacy risks of fine-tuning on code. https://huggingface.co/blog/dhuynh95/starcoder-memorization-experiment, 2023.
  41. Training data leakage analysis in language models. arXiv preprint arXiv:2101.05405, 2021.
  42. Domain-specific continued pretraining of language models for capturing long context in mental health. arXiv preprint arXiv:2304.10447, 2023.
  43. Conservative or liberal? personalized differential privacy. In 2015 IEEE 31St international conference on data engineering, pages 1023–1034. IEEE, 2015.
  44. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2016.
  45. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  46. Toward training at imagenet scale with differential privacy. arXiv preprint arXiv:2201.12328, 2022.
  47. When does differentially private learning not suffer in high dimensions? Advances in Neural Information Processing Systems, 35:28616–28630, 2022.
  48. Large language models can be strong differentially private learners. arXiv preprint arXiv:2110.05679, 2021.
  49. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  50. Same pre-training loss, better downstream: Implicit bias matters for language models. In International Conference on Machine Learning, pages 22188–22214. PMLR, 2023.
  51. Coupling public and private gradient provably helps optimization. arXiv preprint arXiv:2310.01304, 2023.
  52. Analyzing leakage of personally identifiable information in language models. In 2023 IEEE Symposium on Security and Privacy (SP), pages 346–363. IEEE Computer Society, 2023.
  53. Dimension independent generalization of dp-sgd for overparameterized smooth convex optimization. arXiv preprint arXiv:2206.01836, 2022.
  54. Danilo P Mandic. A generalized normalized gradient descent algorithm. IEEE signal processing letters, 11(2):115–118, 2004.
  55. Stochastic gradient descent as approximate bayesian inference. Journal of Machine Learning Research, 18:1–35, 2017.
  56. An empirical model of large-batch training. arXiv preprint arXiv:1812.06162, 2018.
  57. Large scale transfer learning for differentially private image classification. arXiv preprint arXiv:2205.02973, 2022.
  58. Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pages 263–275. IEEE, 2017.
  59. Lit tuned models for efficient species detection. In 2nd AAAI Workshop on AI for Agriculture and Food Systems, 2023.
  60. Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.
  61. Chatgpt spit out sensitive data when told to repeat ’poem’ forever. https://www.wired.com/story/chatgpt-poem-forever-security-roundup/, 2023.
  62. Tempered sigmoid activations for deep learning with differential privacy.
  63. Bleu: a method for automatic evaluation of machine translation. pages 311–318, 2002.
  64. Automatic differentiation in pytorch. 2017.
  65. How to dp-fy ml: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research, 77:1113–1201, 2023.
  66. Katyanna Quach. Inside the 1tb imagenet data set used to train the world’s ai: Naked kids, drunken frat parties, porno stars, and more, Oct 2019.
  67. Language models are unsupervised multitask learners.
  68. Membership inference attack against differentially private deep learning model. Trans. Data Priv., 11(1):61–79, 2018.
  69. Imagenet-21k pretraining for the masses. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
  70. Tan without a burn: Scaling laws of dp-sgd. In International Conference on Machine Learning, pages 29937–29949. PMLR, 2023.
  71. Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training. Information Systems, 106:101718, 2022.
  72. Evading the curse of dimensionality in unconstrained private glms. In International Conference on Artificial Intelligence and Statistics, pages 2638–2646. PMLR, 2021.
  73. How to train your vit? data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270, 2021.
  74. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  75. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  76. Differentially private learning needs better features (or much more data). arXiv preprint arXiv:2011.11660, 2020.
  77. Considerations for differentially private learning with large-scale public pretraining. arXiv preprint arXiv:2212.06470, 2022.
  78. Benchmarking representation learning for natural world image collections. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12884–12893, 2021.
  79. Differentially private empirical risk minimization with non-convex loss functions. In International Conference on Machine Learning, pages 6526–6535. PMLR, 2019.
  80. Di Wang and Jinhui Xu. Differentially private empirical risk minimization with smooth non-convex loss functions: A non-stationary view. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1182–1189, 2019.
  81. Extending multilingual bert to low-resource languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2649–2656, 2020.
  82. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15:3454–3469, 2020.
  83. A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. In International Conference on Learning Representations, 2020.
  84. Normalized/clipped sgd with perturbation for differentially private non-convex optimization. arXiv preprint arXiv:2206.13033, 2022.
  85. Initialization matters: Privacy-utility analysis of overparameterized neural networks. arXiv preprint arXiv:2310.20579, 2023.
  86. Large batch optimization for deep learning: Training bert in 76 minutes. In International Conference on Learning Representations, 2019.
  87. Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298, 2021.
  88. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500, 2021.
  89. Vip: A differentially private foundation model for computer vision. arXiv preprint arXiv:2306.08842, 2023.
  90. Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model. Advances in neural information processing systems, 32, 2019.
  91. Understanding clipping for federated learning: Convergence and client-level differential privacy. In International Conference on Machine Learning, ICML 2022, 2022.
  92. Bypassing the ambient dimension: Private sgd with gradient subspace identification. In International Conference on Learning Representations, 2020.
  93. The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects. arXiv preprint arXiv:1803.00195, 2018.
  94. The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects. In International Conference on Machine Learning, pages 7654–7663. PMLR, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.