Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation (2309.11765v2)
Abstract: We study the problem of in-context learning (ICL) with LLMs on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM Conference on Computer and Communications Security, CCS ’16, pages 308–318, New York, NY, USA, 2016. ACM.
- Driving context into text-to-text privatization. arXiv preprint arXiv:2306.01457, 2023a.
- Guiding text-to-text privatization by syntax. arXiv preprint arXiv:2306.01471, 2023b.
- Art. 29 WP. Opinion 05/2014 on “Anonymisation Techniques”, 2014. URL https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf.
- B. Balle and Y.-X. Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning, pages 394–403. PMLR, 2018.
- Privacy amplification by subsampling: Tight analyses via couplings and divergences. Advances in neural information processing systems, 31, 2018.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33, NeurIPS ’20, 2020.
- Privacy-preserving recommender systems with synthetic query generation using differentially private large language models. arXiv preprint arXiv:2305.05973, 2023.
- Broadening the scope of differential privacy using metrics. In Privacy Enhancing Technologies: 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings 13, pages 82–102. Springer, 2013.
- A customized text sanitization mechanism with differential privacy. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5747–5758, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.355.
- The permute-and-flip mechanism is identical to report-noisy-max with exponential noise. arXiv preprint arXiv:2105.07260, 2021.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2023.
- Sanitizing sentence embeddings (and labels) for local differential privacy. In Proceedings of the ACM Web Conference 2023, pages 2349–2359, 2023.
- Flocks of stochastic parrots: Differentially private prompt learning for large language models. arXiv preprint arXiv:2305.15594, 2023.
- C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, EUROCRYPT ’06, pages 486–503, Berlin, Heidelberg, 2006a. Springer.
- Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography, TCC ’06, pages 265–284, Berlin, Heidelberg, 2006b. Springer.
- Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 889–898, Melbourne, Australia, July 2018. Association for Computational Linguistics.
- Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations. In Proceedings of the 13th international conference on web search and data mining, pages 178–186, 2020.
- Submix: Practical private prediction for large-scale language models. arXiv preprint arXiv:2201.00971, 2022.
- Numerical composition of differential privacy. In Advances in Neural Information Processing Systems, volume 34, pages 11631–11642. Curran Associates, Inc., 2021.
- The curious case of neural text degeneration. In International Conference on Learning Representations, 2020.
- Large language models can be strong differentially private learners. In Proceedings of the 10th International Conference on Learning Representations, ICLR ’22, 2022.
- A conversational movie search system based on conditional random fields. In INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pages 2454–2457. ISCA, 2012.
- Differentially private decoding in large language models. arXiv preprint arXiv:2205.13621, 2022.
- Differentially private language models for secure data sharing. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4860–4873, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics.
- R. McKenna and D. R. Sheldon. Permute-and-flip: A new mechanism for differentially private selection. Advances in Neural Information Processing Systems, 33:193–203, 2020.
- Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics.
- Privacy-preserving domain adaptation of semantic parsers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4950–4970, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Differentially private in-context learning. arXiv preprint arXiv:2305.01639, 2023.
- Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the 5th International Conference on Learning Representations, ICLR ’17, 2017.
- Scalable private learning with PATE. In Proceedings of the 6th International Conference on Learning Representations, ICLR ’18, 2018.
- Are chatbots ready for privacy-sensitive applications? an investigation into input regurgitation and prompt-induced sanitization. arXiv preprint arXiv:2305.15008, 2023.
- Seqpate: Differentially private text generation via knowledge distillation. In Advances in Neural Information Processing Systems, volume 35, pages 11117–11130, 2022.
- J. Ullman. Cs7880: Rigorous approaches to data privacy, spring 2017. URL http://www. ccs. neu. edu/home/jullman/cs7880s17/HW1sol. pdf, 2017.
- US Census Bureau. Official release of source code for the disclosure avoidance system (das) used to protect against the disclosure of individual information based on published statistical summaries, 2020. URL https://github.com/uscensusbureau/DAS_2020_Redistricting_Production_Code.
- Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’00, page 200–207, New York, NY, USA, 2000. Association for Computing Machinery.
- DecodingTrust: A comprehensive assessment of trustworthiness in gpt models. arXiv preprint arXiv:2306.11698, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837, 2022.
- An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022.
- BERT-of-Theseus: Compressing BERT by progressive module replacing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 7859–7869, 2020.
- Differentially private fine-tuning of language models. In International Conference on Learning Representations, ICLR ’22, 2022.
- Synthetic text generation with differential privacy: A simple and practical recipe. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1321–1342, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 649–657. MIT Press, 2015.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, 2021.