Private Fine-tuning of Large Language Models with Zeroth-order Optimization (2401.04343v2)
Abstract: Differentially private stochastic gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner, but has proven difficult to scale to the era of foundation models. We introduce DP-ZO, a private fine-tuning framework for LLMs by privatizing zeroth order optimization methods. A key insight into the design of our method is that the direction of the gradient in the zeroth-order optimization we use is random and the only information from training data is the step size, i.e., a scalar. Therefore, we only need to privatize the scalar step size, which is memory-efficient. DP-ZO provides a strong privacy-utility trade-off across different tasks, and model sizes that are comparable to DP-SGD in $(\varepsilon,\delta)$-DP. Notably, DP-ZO possesses significant advantages over DP-SGD in memory efficiency, and obtains higher utility in $\varepsilon$-DP when using the Laplace mechanism.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318, 2016.
- Anyscale. Anyscale fine-tuning api, 2023. URL https://twitter.com/robertnishihara/status /1707251672328851655.
- Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Advances in Neural Information Processing Systems, 2018.
- Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473, 2014.
- Stability of stochastic gradient descent on nonsmooth convex losses. In Advances in Neural Information Processing Systems, pages 4381–4391, 2020.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, pages 1877–1901, 2020.
- On the convergence and calibration of deep learning with differential privacy. Transactions on Machine Learning Research, 2023a. ISSN 2835-8856. URL https://openreview.net/forum?id=K0CAGgjYS1.
- Automatic clipping: Differentially private deep learning made easier and stronger. In Advances in Neural Information Processing Systems, 2023b.
- Differentially private optimization on large model at small cost. In Proceedings of the 40th International Conference on Machine Learning, pages 3192–3218. PMLR, 2023c.
- Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650, 2022.
- Gaussian differential privacy. arXiv preprint arXiv:1905.02383, 2019.
- Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, page 2665–2679, 2023.
- Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2368–2378, 2019.
- Flocks of stochastic parrots: Differentially private prompt learning for large language models. In Advances in Neural Information Processing Systems, 2023.
- Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015. doi: 10.1109/TIT.2015.2409256.
- The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284, 2006.
- ffuuugor. fp16 support. GitHub issue, 2022. URL https://github.com/pytorch/opacus/issues/377.
- Online convex optimization in the bandit setting: gradient descent without a gradient, 2004.
- Why is public pretraining necessary for private model training? In Proceedings of the 40th International Conference on Machine Learning, pages 10611–10627. PMLR, 2023.
- Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, 2021.
- Stochastic first- and zeroth-order methods for nonconvex stochastic programming. arXiv preprint arXiv:1309.5549, 2013.
- Numerical composition of differential privacy. In Advances in Neural Information Processing Systems, pages 11631–11642, 2021.
- Privacy-preserved distributed learning with zeroth-order optimization. IEEE Transactions on Information Forensics and Security, 17:265–279, 2022. doi: 10.1109/TIFS.2021.3139267.
- Exploring the limits of differentially private deep learning with group-wise clipping. In The Eleventh International Conference on Learning Representations, 2023.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
- Dp-admm: Admm-based distributed learning with differential privacy. IEEE Transactions on Information Forensics and Security, 15:1002–1012, 2020. doi: 10.1109/TIFS.2019.2931068.
- The composition theorem for differential privacy. In Proceedings of the 32nd International Conference on Machine Learning, pages 1376–1385. PMLR, 2015.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
- When does differentially private learning not suffer in high dimensions? In Advances in Neural Information Processing Systems, pages 28616–28630, 2022a.
- Large language models can be strong differentially private learners. In International Conference on Learning Representations, 2022b.
- Zeroth-order stochastic variance reduction for nonconvex optimization. In Advances in Neural Information Processing Systems, 2018.
- lxuechen. parameter sharing support. GitHub issue, 2022. URL https://github.com/lxuechen/ private-transformers/issues/12.
- Fine-tuning language models with just forward passes. In Advances in Neural Information Processing Systems, 2023.
- Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security foundations Symposium (CSF), pages 263–275, 2017.
- Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
- OpenAI. Openai fine-tuning guide, 2023. URL https://platform.openai.com/docs/guides/ fine-tuning.
- Dp-raft: A differentially private recipe for accelerated fine-tuning. arXiv preprint arXiv:2212.04486, 2022.
- How to DP-fy ML: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research, 77:1113–1201, jul 2023. doi: 10.1613/jair.1.14649.
- SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, 2016.
- Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In Proceedings of the 26th Annual Conference on Learning Theory, pages 3–24. PMLR, 2013.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, 2013.
- Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, 2013.
- James C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37:332–341, 1992.
- Differentially private image classification by learning priors from random processes. In Advances in Neural Information Processing Systems, 2023a.
- Privacy-preserving in-context learning with differentially private few-shot generation. arXiv preprint arXiv:2309.11765, 2023b.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Differentially private learning needs better features (or much more data). In International Conference on Learning Representations, 2021.
- Salil Vadhan. The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, pages 347–450, 2017.
- A randomized approach for tight privacy accounting. In Advances in Neural Information Processing Systems, 2023.
- Privacy-preserving in-context learning for large language models. arXiv preprint arXiv:2305.01639, 2023.
- Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298, 2021.
- Do not let privacy overbill utility: Gradient embedding perturbation for private learning. In International Conference on Learning Representations, 2021a.
- Large scale private learning via low-rank reparametrization. In International Conference on Machine Learning, pages 12208–12218. PMLR, 2021b.
- Differentially private fine-tuning of language models. In International Conference on Learning Representations, 2022.
- Dpzero: Dimension-independent and differentially private zeroth-order optimization. arXiv preprint arXiv:2310.09639, 2023.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Optimal accounting of differential privacy via characteristic function. In International Conference on Artificial Intelligence and Statistics, pages 4782–4817. PMLR, 2022.
- Xinyu Tang (20 papers)
- Ashwinee Panda (19 papers)
- Milad Nasr (48 papers)
- Saeed Mahloujifar (43 papers)
- Prateek Mittal (129 papers)