Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Privacy-Preserving Instructions for Aligning Large Language Models (2402.13659v2)

Published 21 Feb 2024 in cs.CR and cs.CL

Abstract: Service providers of LLM applications collect user instructions in the wild and use them in further aligning LLMs with users' intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic instructions to replace real instructions in data annotation and model fine-tuning. Formal differential privacy is guaranteed by generating those synthetic instructions using privately fine-tuned generators. Crucial in achieving the desired utility is our novel filtering algorithm that matches the distribution of the synthetic instructions to that of the real ones. In both supervised fine-tuning and reinforcement learning from human feedback, our extensive experiments demonstrate the high utility of the final set of synthetic instructions by showing comparable results to real instructions. In supervised fine-tuning, models trained with private synthetic instructions outperform leading open-source models such as Vicuna.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (132)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  2. Large-scale differentially private bert. Findings of the Association for Computational Linguistics: EMNLP 2022, 2022.
  3. Anthropic. Anthropic privacy policy, 2023. URL https://console.anthropic.com/legal/privacy. Accessed: 2023-12-31.
  4. Generative models for effective ml on private, decentralized datasets. arXiv preprint arXiv:1911.06679, 2019.
  5. Differentially private query release through adaptive projection. In International Conference on Machine Learning. PMLR, 2021.
  6. Importance of smoothness induced by optimizers in fl4asr: Towards understanding federated learning for end-to-end asr. arXiv preprint arXiv:2309.13102, 2023.
  7. Unlocking accuracy and fairness in differentially private image classification. arXiv preprint arXiv:2308.10888, 2023.
  8. Towards private synthetic text generation. In NeurIPS 2019 Machine Learning with Guarantees Workshop, 2019.
  9. Differentially private bias-term only fine-tuning of foundation models. arXiv preprint arXiv:2210.00036, 2022.
  10. Differentially private optimization on large model at small cost. In International Conference on Machine Learning. PMLR, 2023.
  11. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  12. The secret sharer: Evaluating and testing unintended memorization in neural networks. In USENIX Security Symposium, 2019.
  13. Extracting training data from large language models. USENIX Security Symposium, 2021.
  14. Extracting training data from diffusion models. arXiv preprint arXiv:2301.13188, 2023.
  15. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  16. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  17. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650, 2022.
  18. Revisiting hyperparameter tuning with differential privacy. arXiv preprint arXiv:2211.01852, 2022.
  19. Precision-recall curves using information divergence frontiers. In International Conference on Artificial Intelligence and Statistics, 2020.
  20. Differentially private diffusion models. arXiv preprint arXiv:2210.09929, 2022.
  21. Dp-forward: Fine-tuning and inference on language models with differential privacy in forward pass. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp.  2665–2679, 2023.
  22. Flocks of stochastic parrots: Differentially private prompt learning for large language models. arXiv preprint arXiv:2305.15594, 2023.
  23. Alpacafarm: A simulation framework for methods that learn from human feedback. arXiv preprint arXiv:2305.14387, 2023.
  24. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, 2006a.
  25. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, 2006b.
  26. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 2014.
  27. Why is public pretraining necessary for private model training? In International Conference on Machine Learning, 2023.
  28. Scaling laws for reward model overoptimization. In International Conference on Machine Learning, 2023.
  29. Kamino: Constraint-aware differentially private data synthesis. arXiv preprint arXiv:2012.15713, 2020.
  30. Koala: A dialogue model for academic research. Blog post, April, 2023.
  31. Differentially private diffusion models generate useful synthetic images. arXiv preprint arXiv:2302.13861, 2023.
  32. Faster privacy accounting via evolving discretization. In International Conference on Machine Learning, 2022.
  33. Mixed differential privacy in computer vision. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  34. Numerical composition of differential privacy. Advances in Neural Information Processing Systems, 2021.
  35. Bounding training data reconstruction in private (deep) learning. In International Conference on Machine Learning, 2022.
  36. Dp-merf: Differentially private mean embeddings with randomfeatures for practical privacy-preserving data generation. In International conference on artificial intelligence and statistics. PMLR, 2021.
  37. Exploring the limits of differentially private deep learning with group-wise clipping. International Conference on Learning Representations, 2023.
  38. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
  39. Dp-opt: Make large language model your privacy-preserving prompt engineer. arXiv preprint arXiv:2312.03724, 2023.
  40. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
  41. Lora: Low-rank adaptation of large language models. International Conference on Learning Representations, 2022.
  42. Jagielski, M. A note on interpreting canary exposure. arXiv preprint arXiv:2306.00133, 2023.
  43. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 2019.
  44. The composition theorem for differential privacy. In International conference on machine learning, 2015.
  45. Practical and private (deep) learning without sampling or shuffling. In International Conference on Machine Learning, pp. 5213–5225. PMLR, 2021.
  46. Deduplicating training data mitigates privacy risks in language models. In International Conference on Machine Learning. PMLR, 2022.
  47. Adam: A method for stochastic optimization. International Conference on Learning Representations, 2015.
  48. Computing tight differential privacy guarantees using fft. In International Conference on Artificial Intelligence and Statistics, 2020.
  49. Toward training at imagenet scale with differential privacy. arXiv preprint arXiv:2201.12328, 2022.
  50. Harnessing large-language models to generate private synthetic text. arXiv preprint arXiv:2306.01684, 2023.
  51. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
  52. Deduplicating training data makes language models better. arXiv preprint arXiv:2107.06499, 2021.
  53. When does differentially private learning not suffer in high dimensions? Advances in Neural Information Processing Systems, 2022a.
  54. Large language models can be strong differentially private learners. International Conference on Learning Representations, 2022b.
  55. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval, 2023a.
  56. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463, 2023b.
  57. Openorca: An open dataset of gpt augmented flan reasoning traces. https://https://huggingface.co/Open-Orca/OpenOrca, 2023.
  58. Differentially private synthetic data via foundation model apis 1: Images. arXiv preprint arXiv:2305.15560, 2023.
  59. Private selection from private candidates. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019.
  60. Iterative methods for private synthetic data: Unifying framework and new methods. Advances in Neural Information Processing Systems, 2021.
  61. Generating private synthetic data with genetic algorithms. In International Conference on Machine Learning, pp. 22009–22027. PMLR, 2023.
  62. Why does differential privacy with large epsilon defend against practical membership inference attacks? arXiv preprint arXiv:2402.09540, 2024.
  63. Scalable differential privacy with sparse network finetuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  64. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pp.  142–150, 2011.
  65. Differentially private decoding in large language models. arXiv preprint arXiv:2205.13621, 2022.
  66. Differentially private language models for secure data sharing. arXiv preprint arXiv:2210.13918, 2022.
  67. Winning the nist contest: A scalable and general approach to differentially private synthetic data. arXiv preprint arXiv:2108.04978, 2021.
  68. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677, 2022.
  69. Large scale transfer learning for differentially private image classification. arXiv preprint arXiv:2205.02973, 2022.
  70. The role of adaptive optimizers for honest private hyperparameter selection. In Proceedings of the AAAI conference on artificial intelligence, 2022.
  71. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008), pp. 111–125. IEEE, 2008.
  72. Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035, 2023a.
  73. Tight auditing of differentially private machine learning. arXiv preprint arXiv:2302.07956, 2023b.
  74. Private post-gan boosting. arXiv preprint arXiv:2007.11934, 2020.
  75. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877, 2021.
  76. OpenAI. Aligning language models to follow instructions, 2022. URL https://openai.com/research/instruction-following. Accessed: 2023-12-31.
  77. OpenAI. Openai privacy policy, 2023. URL https://openai.com/policies/privacy-policy. Accessed: 2023-12-31.
  78. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022.
  79. Dp-raft: A differentially private recipe for accelerated fine-tuning. arXiv preprint arXiv:2212.04486, 2022.
  80. Differentially private in-context learning. arXiv preprint arXiv:2305.01639, 2023.
  81. Hyperparameter tuning with renyi differential privacy. International Conference on Learning Representations, 2022.
  82. The text anonymization benchmark (tab): A dedicated corpus and evaluation framework for text anonymization. Computational Linguistics, 48(4):1053–1101, 2022.
  83. Mauve: Measuring the gap between neural text and human text using divergence frontiers. Advances in Neural Information Processing Systems, 34:4816–4828, 2021.
  84. Mauve scores for generative models: Theory and practice. arXiv preprint arXiv:2212.14578, 2022.
  85. Unleashing the power of randomization in auditing differentially private ml. arXiv preprint arXiv:2305.18447, 2023.
  86. How to dp-fy ml: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research, 77:1113–1201, 2023.
  87. Differentially private conditional text generation for synthetic data production. 2022.
  88. Language models are unsupervised multitask learners. OpenAI blog, 2019.
  89. Membership inference attack against differentially private deep learning model. Transactions on Data Privacy, 2018.
  90. Improving the privacy and practicality of objective perturbation for differentially private linear learners. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  91. Differentially private synthetic data: Applied evaluations and enhancements. arXiv preprint arXiv:2011.05537, 2020.
  92. Tan without a burn: Scaling laws of dp-sgd. arXiv preprint arXiv:2210.03403, 2022.
  93. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  94. Differentially private generation of high fidelity samples from diffusion models. 2023.
  95. Differentially private speaker anonymization. arXiv preprint arXiv:2202.11823, 2022.
  96. ShareGPT. Sharegpt:share your wildest chatgpt conversations with one click., 2023. URL https://sharegpt.com/. Accessed: 2023-12-31.
  97. Stochastic gradient descent with differentially private updates. In Global Conference on Signal and Information Processing, 2013.
  98. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 2020.
  99. Defending against reconstruction attacks with r\\\backslash\’enyi differential privacy. arXiv preprint arXiv:2202.07623, 2022.
  100. Differentially private image classification by learning priors from random processes. arXiv preprint arXiv:2306.06076, 2023a.
  101. Privacy-preserving in-context learning with differentially private few-shot generation. arXiv preprint arXiv:2309.11765, 2023b.
  102. Private fine-tuning of large language models with zeroth-order optimization. arXiv preprint arXiv:2401.04343, 2024.
  103. Differentially private mixed-type data generation for unsupervised learning. corr abs/1912.03250 (2019). arXiv preprint arXiv:1912.03250, 2019.
  104. Benchmarking differentially private synthetic data generation algorithms. arXiv preprint arXiv:2112.09238, 2021.
  105. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7, 2023.
  106. Leveraging public representations for private transfer learning. arXiv preprint arXiv:2312.15551, 2023.
  107. Dp-cgan: Differentially private synthetic data and label generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.
  108. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  109. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  110. Considerations for differentially private learning with large-scale public pretraining. arXiv preprint arXiv:2212.06470, 2022.
  111. New oracle-efficient algorithms for private synthetic data release. In International Conference on Machine Learning, pp. 9765–9774. PMLR, 2020.
  112. Private synthetic data for multitask learning and marginal queries. Advances in Neural Information Processing Systems, 2022.
  113. Tl; dr: Mining reddit to learn automatic summarization. In Proceedings of the Workshop on New Frontiers in Summarization, pp.  59–63, 2017.
  114. Can public large language models help private cross-device federated learning? arXiv preprint arXiv:2305.12132, 2023a.
  115. Post-processing private synthetic data for improving utility on selected measures. Advances in Neural Information Processing Systems, 2023b.
  116. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
  117. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
  118. Privately aligning language models with reinforcement learning. arXiv preprint arXiv:2310.16960, 2023.
  119. Large language models can be good privacy protection learners. arXiv preprint arXiv:2310.02469, 2023.
  120. Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739, 2018.
  121. Learning to generate image embeddings with user-level differential privacy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7969–7980, 2023.
  122. Large scale private learning via low-rank reparametrization. In International Conference on Machine Learning, 2021.
  123. Differentially private fine-tuning of language models. International Conference on Learning Representations, 2022.
  124. Synthetic text generation with differential privacy: A simple and practical recipe. arXiv preprint arXiv:2210.14348, 2022.
  125. Differentially private sgd without clipping bias: An error-feedback approach. arXiv preprint arXiv:2311.14632, 2023a.
  126. It’s a fair game, or is it? examining how users navigate disclosure risks and benefits when using llm-based conversational agents. arXiv preprint arXiv:2309.11653, 2023b.
  127. (inthe)wildchat: 570k chatGPT interaction logs in the wild. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Bl8u7ZRlbM.
  128. Lmsys-chat-1m: A large-scale real-world llm conversation dataset. arXiv preprint arXiv:2309.11998, 2023a.
  129. The chatbot arena conversations dataset., 2023b. URL https://huggingface.co/datasets/lmsys/chatbot_arena_conversations. Accessed: 2023-12-31.
  130. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023c.
  131. Private prediction strikes back! private kernelized nearest neighbors with individual renyi filter. arXiv preprint arXiv:2306.07381, 2023.
  132. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Da Yu (19 papers)
  2. Peter Kairouz (75 papers)
  3. Sewoong Oh (128 papers)
  4. Zheng Xu (73 papers)
Citations (10)