Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks (2403.08481v1)

Published 13 Mar 2024 in cs.LG and cs.CR

Abstract: Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of LLMs to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned LLMs to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308–318.
  2. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 54–63. https://doi.org/10.18653/v1/S19-2007
  3. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 1897–1914.
  4. Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646 (2022).
  5. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21). 2633–2650.
  6. Relaxloss: Defending membership inference attacks without losing utility. arXiv preprint arXiv:2207.05801 (2022).
  7. Scaling Instruction-Finetuned Language Models. https://doi.org/10.48550/ARXIV.2210.11416
  8. Antreas Dionysiou and Elias Athanasopoulos. 2023. SoK: Membership Inference is Harder Than Previously Thought. Proceedings on Privacy Enhancing Technologies 3 (2023), 286–306.
  9. Jian Du and Haitao Mi. 2021. DP-FP: Differentially Private Forward Propagation for Large Models. arXiv preprint arXiv:2112.14430 (2021).
  10. An efficient DP-SGD mechanism for large scale NLU models. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4118–4122.
  11. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 (2014), 211–407.
  12. Submix: Practical private prediction for large-scale language models. arXiv preprint arXiv:2201.00971 (2022).
  13. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
  14. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
  15. Membership-Doctor: Comprehensive Assessment of Membership Inference Against Machine Learning Models. arXiv preprint arXiv:2208.10445 (2022).
  16. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  17. LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  18. Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR) 54, 11s (2022), 1–37.
  19. Defenses to Membership Inference Attacks: A Survey. ACM Comput. Surv. (sep 2023). https://doi.org/10.1145/3620667 Just Accepted.
  20. Membership inference attack susceptibility of clinical language models. arXiv preprint arXiv:2104.08305 (2021).
  21. Memguard: Defending against black-box membership inference attacks via adversarial examples. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 259–274.
  22. Yigitcan Kaya and Tudor Dumitras. 2021. When does data augmentation help with membership inference attacks?. In International conference on machine learning. PMLR, 5345–5355.
  23. Security and privacy in machine learning: A survey. Issues in Information Systems 22, 3 (2021).
  24. Large language models can be strong differentially private learners. arXiv preprint arXiv:2110.05679 (2021).
  25. SocInf: Membership inference attacks on social media health data with machine learning. IEEE Transactions on Computational Social Systems 6, 5 (2019), 907–921.
  26. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
  27. Differentially private decoding in large language models. arXiv preprint arXiv:2205.13621 (2022).
  28. Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models. arXiv:2310.15007 [cs.CL]
  29. Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys 56, 2 (2023), 1–40.
  30. SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark. arXiv preprint arXiv:2307.06123 (2023).
  31. Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. arXiv preprint cs/0506075 (2005), 1897–1914.
  32. SoK: Security and privacy in machine learning. In 2018 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 399–414.
  33. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
  34. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  35. Maria Rigaki and Sebastian Garcia. 2020. A Survey of Privacy Attacks in Machine Learning. arXiv preprint arXiv:2007.07646 (2020).
  36. Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).
  37. Improved Membership Inference Attacks Against Language Classification Models. In arXiv:2310.07219. https://doi.org/10.48550/arXiv.2310.07219
  38. Membership inference attacks against NLP classification models. In NeurIPS 2021 Workshop Privacy in Machine Learning. 1897–1914.
  39. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3–18.
  40. Congzheng Song and Vitaly Shmatikov. 2019. Auditing Data Provenance in Text-Generation Models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 196–206. https://doi.org/10.1145/3292500.3330885
  41. Towards demystifying membership inference attacks. arXiv preprint arXiv:1807.09173 (2018).
  42. Demystifying membership inference attacks in machine learning as a service. IEEE Transactions on Services Computing 14, 6 (2019), 2073–2089.
  43. Against membership inference attack: Pruning is all you need. arXiv preprint arXiv:2008.13578 (2020).
  44. Analyzing and Defending against Membership Inference Attacks in Natural Language Processing Classification. In 2022 IEEE International Conference on Big Data (Big Data). IEEE, 5823–5832.
  45. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
  46. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500 (2021).
  47. Bag of tricks for training data extraction from language models. arXiv preprint arXiv:2302.04460 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Guy Amit (15 papers)
  2. Abigail Goldsteen (9 papers)
  3. Ariel Farkash (6 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets