Sampling-based Pseudo-Likelihood for Membership Inference Attacks
Abstract: LLMs are trained on large-scale web data, which makes it difficult to grasp the contribution of each text. This poses the risk of leaking inappropriate data such as benchmarks, personal information, and copyrighted texts in the training data. Membership Inference Attacks (MIA), which determine whether a given text is included in the model's training data, have been attracting attention. Previous studies of MIAs revealed that likelihood-based classification is effective for detecting leaks in LLMs. However, the existing methods cannot be applied to some proprietary models like ChatGPT or Claude 3 because the likelihood is unavailable to the user. In this study, we propose a Sampling-based Pseudo-Likelihood (\textbf{SPL}) method for MIA (\textbf{SaMIA}) that calculates SPL using only the text generated by an LLM to detect leaks. The SaMIA treats the target text as the reference text and multiple outputs from the LLM as text samples, calculates the degree of $n$-gram match as SPL, and determines the membership of the text in the training data. Even without likelihoods, SaMIA performed on par with existing likelihood-based methods.
- The falcon series of open language models. ArXiv, abs/2311.16867.
- Evaluating gender bias of pre-trained language models in natural language inference by considering all labels. arXiv preprint arXiv:2309.09697.
- The pushshift reddit dataset. ArXiv, abs/2001.08435.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Extracting training data from large language models. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pages 2633–2650. USENIX Association.
- Ronen Eldan and Mark Russinovich. 2023. Who’s harry potter? approximate unlearning in llms. ArXiv, abs/2310.02238.
- Masahiro Kaneko and Timothy Baldwin. 2024. A little leak will sink a great ship: Survey of transparency for large language models from start to finish. ArXiv, abs/2403.16139.
- Eagle: Ethical dataset given from real interactions. arXiv preprint arXiv:2402.14258.
- The gaps between pre-train and downstream settings in bias evaluation and debiasing. arXiv preprint arXiv:2401.08511.
- Evaluating gender bias in large language models via chain-of-thought prompting. arXiv preprint arXiv:2401.15585.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Membership inference attacks against language models via neighbourhood comparison. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11330–11343, Toronto, Canada. Association for Computational Linguistics.
- In-contextual bias suppression for large language models. arXiv preprint arXiv:2309.07251.
- Detecting pretraining data from large language models. In Proceedings of the Neural Information Processing Systems (NeurIPS).
- Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18.
- Llama 2: Open foundation and fine-tuned chat models.
- Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- DNA-GPT: Divergent n-gram analysis for training-free detection of GPT-generated text. In The Twelfth International Conference on Learning Representations.
- Enhanced membership inference attacks against machine learning models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 3093–3106.
- Privacy risk in machine learning: Analyzing the connection to overfitting. 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 268–282.
- Privacy risk in machine learning: Analyzing the connection to overfitting. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018, pages 268–282. IEEE Computer Society.
- Bag of tricks for training data extraction from language models. ArXiv, abs/2302.04460.
- Opt: Open pre-trained transformer language models.
- A survey of large language models. ArXiv, abs/2303.18223.
- Don’t make your llm an evaluation benchmark cheater. ArXiv, abs/2311.01964.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.