Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Differentially Private Knowledge Distillation via Synthetic Text Generation (2403.00932v2)

Published 1 Mar 2024 in cs.LG, cs.CL, and cs.CR

Abstract: LLMs are achieving state-of-the-art performance in many different downstream tasks. However, the increasing urgency of data privacy puts pressure on practitioners to train LLMs with Differential Privacy (DP) on private data. Concurrently, the exponential growth in parameter size of LLMs necessitates model compression before deployment of LLMs on resource-constrained devices or latency-sensitive applications. Differential privacy and model compression generally must trade off utility loss to achieve their objectives. Moreover, simultaneously applying both schemes can compound the utility degradation. To this end, we propose DistilDP: a novel differentially private knowledge distillation algorithm that exploits synthetic data generated by a differentially private teacher LLM. The knowledge of a teacher LLM is transferred onto the student in two ways: one way from the synthetic data itself -- the hard labels, and the other way by the output distribution of the teacher evaluated on the synthetic data -- the soft labels. Furthermore, if the teacher and student share a similar architectural structure, we can further distill knowledge by aligning the hidden representations between both. Our experimental results demonstrate that DistilDP can substantially improve the utility over existing baselines, at least $9.0$ PPL on the Big Patent dataset, with strong privacy parameters, $\epsilon=2$. These promising results progress privacy-preserving compression of autoregressive LLMs. Our code can be accessed here: https://github.com/james-flemings/dp_compress.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318.
  2. Towards private synthetic text generation. In NeurIPS 2019 Machine Learning with Guarantees Workshop.
  3. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium (USENIX Security 19), pages 267–284.
  4. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650.
  5. Data-free learning of student networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3514–3522.
  6. Cynthia Dwork. 2006. Differential privacy. In International colloquium on automata, languages, and programming, pages 1–12. Springer.
  7. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833.
  8. Why is public pretraining necessary for private model training? In International Conference on Machine Learning, pages 10611–10627. PMLR.
  9. The knowledge within: Methods for data-free model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8494–8502.
  10. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  11. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  13. dp-transformers: Training transformer models with differential privacy.
  14. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
  15. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
  16. Harnessing large-language models to generate private synthetic text. arXiv preprint arXiv:2306.01684.
  17. When does differentially private learning not suffer in high dimensions? Advances in Neural Information Processing Systems, 35:28616–28630.
  18. Large language models can be strong differentially private learners. arXiv preprint arXiv:2110.05679.
  19. Lingjuan Lyu and Chi-Hua Chen. 2020. Differentially private knowledge distillation for mobile analytics. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1809–1812.
  20. Differentially private language models for secure data sharing. arXiv preprint arXiv:2210.13918.
  21. Differentially private model compression. Advances in Neural Information Processing Systems, 35:29468–29483.
  22. Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755.
  23. Scalable private learning with pate. arXiv preprint arXiv:1802.08908.
  24. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  25. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  26. Bigpatent: A large-scale dataset for abstractive and coherent summarization. arXiv preprint arXiv:1906.03741.
  27. Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355.
  28. Private model compression via knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1190–1197.
  29. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  30. Federated learning of gboard language models with differential privacy. arXiv preprint arXiv:2305.18465.
  31. Opacus: User-friendly differential privacy library in pytorch. arXiv preprint arXiv:2109.12298.
  32. Training private and efficient language models with synthetic data from llms. In Socially Responsible Language Modelling Research.
  33. Selective pre-training for private fine-tuning. arXiv preprint arXiv:2305.13865.
  34. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500.
  35. Synthetic text generation with differential privacy: A simple and practical recipe. arXiv preprint arXiv:2210.14348.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. James Flemings (6 papers)
  2. Murali Annavaram (42 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com