Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sampling-based Pseudo-Likelihood for Membership Inference Attacks (2404.11262v1)

Published 17 Apr 2024 in cs.CL

Abstract: LLMs are trained on large-scale web data, which makes it difficult to grasp the contribution of each text. This poses the risk of leaking inappropriate data such as benchmarks, personal information, and copyrighted texts in the training data. Membership Inference Attacks (MIA), which determine whether a given text is included in the model's training data, have been attracting attention. Previous studies of MIAs revealed that likelihood-based classification is effective for detecting leaks in LLMs. However, the existing methods cannot be applied to some proprietary models like ChatGPT or Claude 3 because the likelihood is unavailable to the user. In this study, we propose a Sampling-based Pseudo-Likelihood (\textbf{SPL}) method for MIA (\textbf{SaMIA}) that calculates SPL using only the text generated by an LLM to detect leaks. The SaMIA treats the target text as the reference text and multiple outputs from the LLM as text samples, calculates the degree of $n$-gram match as SPL, and determines the membership of the text in the training data. Even without likelihoods, SaMIA performed on par with existing likelihood-based methods.

Exploring Leakage Detection in LLMs Through Sampling-based Pseudo-Likelihood

Introduction to SaMIA

LLMs, due to their vast and diverse training datasets, are susceptible to unintended memorization, potentially causing the leakage of sensitive or proprietary information. This paper introduces a novel method, Sampling-based Pseudo-Likelihood (SPL) for Membership Inference Attacks (MIA), referred to as SaMIA. This method is particularly significant as it operates under conditions where traditional likelihood-based MIA methods falter -- specifically, it does not require access to the model's internal likelihood calculations, extending its applicability to proprietary models like ChatGPT or Claude 3.

Mechanisms of SaMIA

SaMIA operates by generating multiple text outputs from a provided text prefix using an LLM, and then comparing these generated texts against a reference text. The reference text is part of the original input text, and the comparison focuses on the overlap of nn-grams between the generated and reference texts. The degree of this overlap, calculated through ROUGE-N metrics, forms a pseudo-likelihood indicator of whether the reference text was part of the model’s training dataset:

  • Text Splitting: Each text is split into a prefix and a reference segment.
  • Text Generation: Multiple continuations are generated from the prefix.
  • Overlap Calculation: The overlap of nn-grams between generated texts and the reference segment is computed.

Key Results

Experimental evaluation across several public LLMs demonstrated that SaMIA achieves comparable, if not superior, performance to existing likelihood or loss-based MIA methods. The paper notably highlights the efficacy of SaMIA in scenarios where the likelihood is inaccessible, providing a robust alternative to traditional MIA techniques.

  • Performance Metrics: Utilizing metrics like AUC and TPR@10%FPR, SaMIA displayed strong performance across various models and settings.
  • Comparison with Existing Methods: In many cases, SaMIA outperformed or matched state-of-the-art MIA methods that rely on internal model probabilities.

Theoretical and Practical Implications

On a theoretical level, SaMIA enriches the toolkit for studying model leakage in settings void of direct model introspection capabilities. Practically, it offers a method to audit LLMs for potential data leakage without needing proprietary information about the model. This could be particularly valuable for companies and developers seeking to ensure that their models comply with privacy regulations and intellectual property rights.

Future Directions

While SaMIA represents a substantial advancement in the MIA landscape, the paper suggests several avenues for future research. Enhancing the methodology to further minimize the dependency on model outputs and refining its applicability to different domains and models could provide deeper insights and broader applicability. Additionally, exploring the integration of other statistical comparison measures beyond unigram and bigram matches could offer improvements in detection sensitivity and specificity.

Conclusion

The introduction of SaMIA provides a significant step toward more versatile and accessible means of detecting data leakage in LLMs, particularly in environments where traditional methods are not applicable. Its ability to work without internal model data makes it a promising tool for a wide array of applications in data security and model auditing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. The falcon series of open language models. ArXiv, abs/2311.16867.
  2. Evaluating gender bias of pre-trained language models in natural language inference by considering all labels. arXiv preprint arXiv:2309.09697.
  3. The pushshift reddit dataset. ArXiv, abs/2001.08435.
  4. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR.
  5. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  6. Extracting training data from large language models. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pages 2633–2650. USENIX Association.
  7. Ronen Eldan and Mark Russinovich. 2023. Who’s harry potter? approximate unlearning in llms. ArXiv, abs/2310.02238.
  8. Masahiro Kaneko and Timothy Baldwin. 2024. A little leak will sink a great ship: Survey of transparency for large language models from start to finish. ArXiv, abs/2403.16139.
  9. Eagle: Ethical dataset given from real interactions. arXiv preprint arXiv:2402.14258.
  10. The gaps between pre-train and downstream settings in bias evaluation and debiasing. arXiv preprint arXiv:2401.08511.
  11. Evaluating gender bias in large language models via chain-of-thought prompting. arXiv preprint arXiv:2401.15585.
  12. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  13. Membership inference attacks against language models via neighbourhood comparison. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11330–11343, Toronto, Canada. Association for Computational Linguistics.
  14. In-contextual bias suppression for large language models. arXiv preprint arXiv:2309.07251.
  15. Detecting pretraining data from large language models. In Proceedings of the Neural Information Processing Systems (NeurIPS).
  16. Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18.
  17. Llama 2: Open foundation and fine-tuned chat models.
  18. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  19. DNA-GPT: Divergent n-gram analysis for training-free detection of GPT-generated text. In The Twelfth International Conference on Learning Representations.
  20. Enhanced membership inference attacks against machine learning models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 3093–3106.
  21. Privacy risk in machine learning: Analyzing the connection to overfitting. 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 268–282.
  22. Privacy risk in machine learning: Analyzing the connection to overfitting. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018, pages 268–282. IEEE Computer Society.
  23. Bag of tricks for training data extraction from language models. ArXiv, abs/2302.04460.
  24. Opt: Open pre-trained transformer language models.
  25. A survey of large language models. ArXiv, abs/2303.18223.
  26. Don’t make your llm an evaluation benchmark cheater. ArXiv, abs/2311.01964.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Masahiro Kaneko (46 papers)
  2. Youmi Ma (7 papers)
  3. Yuki Wata (1 paper)
  4. Naoaki Okazaki (70 papers)
Citations (6)