Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Private Fine-tuning of Large Language Models with Zeroth-order Optimization (2401.04343v2)

Published 9 Jan 2024 in cs.LG, cs.CL, and cs.CR

Abstract: Differentially private stochastic gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner, but has proven difficult to scale to the era of foundation models. We introduce DP-ZO, a private fine-tuning framework for LLMs by privatizing zeroth order optimization methods. A key insight into the design of our method is that the direction of the gradient in the zeroth-order optimization we use is random and the only information from training data is the step size, i.e., a scalar. Therefore, we only need to privatize the scalar step size, which is memory-efficient. DP-ZO provides a strong privacy-utility trade-off across different tasks, and model sizes that are comparable to DP-SGD in $(\varepsilon,\delta)$-DP. Notably, DP-ZO possesses significant advantages over DP-SGD in memory efficiency, and obtains higher utility in $\varepsilon$-DP when using the Laplace mechanism.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318, 2016.
  2. Anyscale. Anyscale fine-tuning api, 2023. URL https://twitter.com/robertnishihara/status /1707251672328851655.
  3. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Advances in Neural Information Processing Systems, 2018.
  4. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473, 2014.
  5. Stability of stochastic gradient descent on nonsmooth convex losses. In Advances in Neural Information Processing Systems, pages 4381–4391, 2020.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems, pages 1877–1901, 2020.
  7. On the convergence and calibration of deep learning with differential privacy. Transactions on Machine Learning Research, 2023a. ISSN 2835-8856. URL https://openreview.net/forum?id=K0CAGgjYS1.
  8. Automatic clipping: Differentially private deep learning made easier and stronger. In Advances in Neural Information Processing Systems, 2023b.
  9. Differentially private optimization on large model at small cost. In Proceedings of the 40th International Conference on Machine Learning, pages 3192–3218. PMLR, 2023c.
  10. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650, 2022.
  11. Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
  12. Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, page 2665–2679, 2023.
  13. Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2368–2378, 2019.
  14. Flocks of stochastic parrots: Differentially private prompt learning for large language models. In Advances in Neural Information Processing Systems, 2023.
  15. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015. doi: 10.1109/TIT.2015.2409256.
  16. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  17. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284, 2006.
  18. ffuuugor. fp16 support. GitHub issue, 2022. URL https://github.com/pytorch/opacus/issues/377.
  19. Online convex optimization in the bandit setting: gradient descent without a gradient, 2004.
  20. Why is public pretraining necessary for private model training? In Proceedings of the 40th International Conference on Machine Learning, pages 10611–10627. PMLR, 2023.
  21. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, 2021.
  22. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. arXiv preprint arXiv:1309.5549, 2013.
  23. Numerical composition of differential privacy. In Advances in Neural Information Processing Systems, pages 11631–11642, 2021.
  24. Privacy-preserved distributed learning with zeroth-order optimization. IEEE Transactions on Information Forensics and Security, 17:265–279, 2022. doi: 10.1109/TIFS.2021.3139267.
  25. Exploring the limits of differentially private deep learning with group-wise clipping. In The Eleventh International Conference on Learning Representations, 2023.
  26. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  27. Dp-admm: Admm-based distributed learning with differential privacy. IEEE Transactions on Information Forensics and Security, 15:1002–1012, 2020. doi: 10.1109/TIFS.2019.2931068.
  28. The composition theorem for differential privacy. In Proceedings of the 32nd International Conference on Machine Learning, pages 1376–1385. PMLR, 2015.
  29. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
  30. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
  31. When does differentially private learning not suffer in high dimensions? In Advances in Neural Information Processing Systems, pages 28616–28630, 2022a.
  32. Large language models can be strong differentially private learners. In International Conference on Learning Representations, 2022b.
  33. Zeroth-order stochastic variance reduction for nonconvex optimization. In Advances in Neural Information Processing Systems, 2018.
  34. lxuechen. parameter sharing support. GitHub issue, 2022. URL https://github.com/lxuechen/ private-transformers/issues/12.
  35. Fine-tuning language models with just forward passes. In Advances in Neural Information Processing Systems, 2023.
  36. Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security foundations Symposium (CSF), pages 263–275, 2017.
  37. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
  38. OpenAI. Openai fine-tuning guide, 2023. URL https://platform.openai.com/docs/guides/ fine-tuning.
  39. Dp-raft: A differentially private recipe for accelerated fine-tuning. arXiv preprint arXiv:2212.04486, 2022.
  40. How to DP-fy ML: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research, 77:1113–1201, jul 2023. doi: 10.1613/jair.1.14649.
  41. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, 2016.
  42. Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In Proceedings of the 26th Annual Conference on Learning Theory, pages 3–24. PMLR, 2013.
  43. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, 2013.
  44. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, 2013.
  45. James C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37:332–341, 1992.
  46. Differentially private image classification by learning priors from random processes. In Advances in Neural Information Processing Systems, 2023a.
  47. Privacy-preserving in-context learning with differentially private few-shot generation. arXiv preprint arXiv:2309.11765, 2023b.
  48. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  49. Differentially private learning needs better features (or much more data). In International Conference on Learning Representations, 2021.
  50. Salil Vadhan. The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, pages 347–450, 2017.
  51. A randomized approach for tight privacy accounting. In Advances in Neural Information Processing Systems, 2023.
  52. Privacy-preserving in-context learning for large language models. arXiv preprint arXiv:2305.01639, 2023.
  53. Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298, 2021.
  54. Do not let privacy overbill utility: Gradient embedding perturbation for private learning. In International Conference on Learning Representations, 2021a.
  55. Large scale private learning via low-rank reparametrization. In International Conference on Machine Learning, pages 12208–12218. PMLR, 2021b.
  56. Differentially private fine-tuning of language models. In International Conference on Learning Representations, 2022.
  57. Dpzero: Dimension-independent and differentially private zeroth-order optimization. arXiv preprint arXiv:2310.09639, 2023.
  58. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  59. Optimal accounting of differential privacy via characteristic function. In International Conference on Artificial Intelligence and Statistics, pages 4782–4817. PMLR, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xinyu Tang (20 papers)
  2. Ashwinee Panda (19 papers)
  3. Milad Nasr (48 papers)
  4. Saeed Mahloujifar (43 papers)
  5. Prateek Mittal (129 papers)
Citations (11)