Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models (2404.02936v4)

Published 3 Apr 2024 in cs.CL and cs.LG

Abstract: The problem of pre-training data detection for LLMs has received growing attention due to its implications in critical issues like copyright violation and test data contamination. Despite improved performance, existing methods (including the state-of-the-art, Min-K%) are mostly developed upon simple heuristics and lack solid, reasonable foundations. In this work, we propose a novel and theoretically motivated methodology for pre-training data detection, named Min-K%++. Specifically, we present a key insight that training samples tend to be local maxima of the modeled distribution along each input dimension through maximum likelihood training, which in turn allow us to insightfully translate the problem into identification of local maxima. Then, we design our method accordingly that works under the discrete distribution modeled by LLMs, whose core idea is to determine whether the input forms a mode or has relatively high probability under the conditional categorical distribution. Empirically, the proposed method achieves new SOTA performance across multiple settings. On the WikiMIA benchmark, Min-K%++ outperforms the runner-up by 6.2% to 10.5% in detection AUROC averaged over five models. On the more challenging MIMIR benchmark, it consistently improves upon reference-free methods while performing on par with reference-based method that requires an extra reference model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023.
  3. GPT-neox-20b: An open-source autoregressive language model. In Challenges & Perspectives in Creating Large Language Models, 2022. URL https://openreview.net/forum?id=HL7IhzS8W5.
  4. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021.
  5. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914. IEEE, 2022a.
  6. The privacy onion effect: Memorization is relative. Advances in Neural Information Processing Systems, 35:13263–13276, 2022b.
  7. Quantifying memorization across neural language models. 2023a.
  8. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5253–5270, 2023b.
  9. Together Computer. Redpajama: an open dataset for training large language models, 2023. URL https://github.com/togethercomputer/RedPajama-Data.
  10. Do membership inference attacks work on large language models? arXiv preprint arXiv:2402.07841, 2024.
  11. DE-COP: Detecting Copyrighted Content in Language Models Training Data, 2024.
  12. Practical membership inference attacks against fine-tuned large language models via self-prompt calibration, 2023.
  13. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  14. The times sues openai and microsoft over a.i. use of copyrighted work. https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html, 2023.
  15. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
  16. On calibration of modern neural networks. In ICML, 2017.
  17. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
  18. Kate Knibbs. The battle over books3 could change ai forever. https://www.wired.com/story/battle-over-books3/, 2023.
  19. Statistical efficiency of score matching: The view from isoperimetry. arXiv preprint arXiv:2210.00726, 2022.
  20. Enhancing the reliability of out-of-distribution image detection in neural networks. In ICLR, 2018.
  21. Membership inference attacks against language models via neighbourhood comparison. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 11330–11343, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.719. URL https://aclanthology.org/2023.findings-acl.719.
  22. Did the neurons read your book? document-level membership inference for large language models. arXiv preprint arXiv:2310.15007, 2023.
  23. Quantifying privacy risks of masked language models using membership inference attacks. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8332–8347, Abu Dhabi, United Arab Emirates, December 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.570. URL https://aclanthology.org/2022.emnlp-main.570.
  24. An empirical analysis of memorization in fine-tuned autoregressive language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1816–1826, 2022b.
  25. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In International Conference on Machine Learning, pages 24950–24962. PMLR, 2023.
  26. Proving test set contamination in black box language models. arXiv preprint arXiv:2310.17623, 2023.
  27. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  28. Detecting pretraining data from large language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=zWqr3MQuNs.
  29. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017.
  30. Privacy auditing with one (1) training run. 2023.
  31. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  32. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  33. On the importance of difficulty calibration in membership inference attacks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=3eIrli0TwQ.
  34. Machine unlearning of pre-trained large language models. arXiv preprint arXiv:2402.15159, 2024.
  35. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018.
  36. Low-cost high-power membership inference by boosting relativity. arXiv preprint arXiv:2312.03262, 2023.
  37. Openood v1.5: Enhanced benchmark for out-of-distribution detection. arXiv preprint arXiv:2306.09301, 2023.
  38. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
Citations (27)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com