Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration (2311.06062v4)
Abstract: Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Existing MIAs designed for LLMs can be bifurcated into two types: reference-free and reference-based attacks. Although reference-based attacks appear promising performance by calibrating the probability measured on the target model with reference models, this illusion of privacy risk heavily depends on a reference dataset that closely resembles the training set. Both two types of attacks are predicated on the hypothesis that training records consistently maintain a higher probability of being sampled. However, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. Thus, these reasons lead to high false-positive rates of MIAs in practical scenarios. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs. Furthermore, we introduce probabilistic variation, a more reliable membership signal based on LLM memorization rather than overfitting, from which we rediscover the neighbour attack with theoretical grounding. Comprehensive evaluation conducted on three datasets and four exemplary LLMs shows that SPV-MIA raises the AUC of MIAs from 0.7 to a significantly high level of 0.9. Our code and dataset are available at: https://github.com/tsinghua-fib-lab/NeurIPS2024_SPV-MIA
- Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). Association for Computing Machinery, New York, NY, USA, 308–318.
- TMI! Finetuned Models Leak Private Information from Their Pretraining Data. arXiv:2306.01181 [cs]
- Falcon-40B: an open large language model with state-of-the-art performance. Technical Report. Technical report, Technology Innovation Institute.
- Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature. arXiv:2310.05130 [cs]
- Andrew P Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 30, 7 (1997), 1145–1159.
- Membership Inference Attacks From First Principles. In 2022 IEEE Symposium on Security and Privacy (SP). 1897–1914.
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21). 2633–2650.
- Label-Only Membership Inference Attacks. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 1964–1974.
- Can Large Language Models Provide Feedback to Students? A Case Study on ChatGPT. In 2023 IEEE International Conference on Advanced Learning Technologies (ICALT). 323–325.
- Gerrit J. J. Van den Burg and Chris Williams. 2021. On Memorization in Probabilistic Deep Generative Models. In Advances in Neural Information Processing Systems.
- Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models. Nature Machine Intelligence 5, 3 (March 2023), 220–235.
- Are Diffusion Models Vulnerable to Membership Inference Attacks?. In Proceedings of the 38th International Conference on Machine Learning, {}ICML{} 2023. PMLR. arXiv:2302.01316 [cs]
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 265–284.
- Vitaly Feldman. 2020. Does Learning Require Memorization? A Short Tale about a Long Tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC 2020). Association for Computing Machinery, New York, NY, USA, 954–959.
- Vitaly Feldman and Chiyuan Zhang. 2020. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation. Advances in Neural Information Processing Systems 33 (2020), 2881–2891.
- A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models. arXiv e-prints (2023), arXiv–2308.
- Large language model AI chatbots require approval as medical devices. Nature Medicine (2023), 1–3.
- LOGAN: Membership Inference Attacks Against Generative Models. Proceedings on Privacy Enhancing Technologies 1 (2019), 133–152.
- Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2006–2012.
- Teaching machines to read and comprehend. Advances in neural information processing systems 28 (2015).
- LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.
- Membership Inference Attacks on Machine Learning: A Survey. Comput. Surveys 54, 11s (Sept. 2022), 235:1–235:37.
- LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. arXiv:2304.01933 [cs]
- Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–15.
- Memguard: Defending against black-box membership inference attacks via adversarial examples. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 259–274.
- Belveze Jules. 2022. TLDR News Dataset. https://huggingface.co/datasets/JulesBelveze/tldr_news
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
- Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance. arXiv preprint arXiv:2107.02027 (2021).
- Weight Poisoning Attacks on Pretrained Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2793–2806.
- Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? arXiv:2104.07762 [cs]
- Large Language Models Can Be Strong Differentially Private Learners. In International Conference on Learning Representations.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4582–4597.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965.
- GPT understands, too. AI Open (2023).
- Membership Inference Attacks by Exploiting Loss Trajectory. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS ’22). Association for Computing Machinery, New York, NY, USA, 2085–2098.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7
- Membership Inference Attacks against Language Models via Neighbourhood Comparison. arXiv:2305.18462 [cs]
- Pointer Sentinel Mixture Models. In International Conference on Learning Representations.
- Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey. Comput. Surveys 56, 2 (Sept. 2023), 30:1–30:40.
- Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 8332–8347.
- Privacy Regularization: Joint Privacy-Utility Optimization in LanguageModels. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3799–3807.
- An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 1816–1826.
- DetectGPT: Zero-Shot Machine-Generated Text Detection Using Probability Curvature. In Proceedings of the 38th International Conference on Machine Learning, {}ICML{} 2023.
- Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1797–1807.
- Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In 2019 IEEE Symposium on Security and Privacy (SP). 739–753.
- OpenAI. 2023a. ChatGPT: Optimizing Language Models for Dialogue. http://web.archive.org/web/20230109000707/https://openai.com/blog/chatgpt/.
- OpenAI. 2023b. OpenAI Documentation-Text Generation-Completions API . https://platform.openai.com/docs/guides/text-generation/completions-api.
- An Attacker’s Dream? Exploring the Capabilities of ChatGPT for Developing Malware. In Proceedings of the 16th Cyber Security Experimentation and Test Workshop. 10–18.
- Privacy in the Time of Language Models. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (WSDM ’23). Association for Computing Machinery, New York, NY, USA, 1291–1292.
- Lutz Prechelt. 2002. Early stopping-but when? In Neural Networks: Tricks of the trade. Springer, 55–69.
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/222_Paper.pdf
- White-Box vs Black-box: Bayes Optimal Strategies for Membership Inference. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 5558–5567.
- ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Network and Distributed Systems Security (NDSS) Symposium 2019.
- Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy (SP). 3–18.
- James Stewart. 2001. Multivariable calculus: concepts and contexts. Brooks/Cole.
- Galactica: A Large Language Model for Science. arXiv:2211.09085 [cs, stat]
- Yingjie Tian and Yuqi Zhang. 2022. A comprehensive survey on regularization strategies in machine learning. Information Fusion 80 (2022), 146–166.
- Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models. Advances in Neural Information Processing Systems 35 (Dec. 2022), 38274–38290.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
- Concealed Data Poisoning Attacks on NLP Models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 139–150.
- Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- On the Importance of Difficulty Calibration in Membership Inference Attacks. In International Conference on Learning Representations.
- BloombergGPT: A Large Language Model for Finance. arXiv:2303.17564 [cs, q-fin]
- A Large Language Model for Electronic Health Records. npj Digital Medicine 5, 1 (Dec. 2022), 1–9.
- Enhanced Membership Inference Attacks against Machine Learning Models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS ’22). Association for Computing Machinery, New York, NY, USA, 3093–3106.
- Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF). IEEE, 268–282.
- Differentially Private Fine-tuning of Language Models. In International Conference on Learning Representations.
- Membership Inference Attacks Against Recommender Systems. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). Association for Computing Machinery, New York, NY, USA, 864–879.
- Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015).
- A Survey of Large Language Models. arXiv:2303.18223 [cs]
- Wenjie Fu (9 papers)
- Huandong Wang (35 papers)
- Chen Gao (136 papers)
- Guanghua Liu (5 papers)
- Yong Li (628 papers)
- Tao Jiang (274 papers)