Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Contamination Calibration for Black-box LLMs (2405.11930v2)

Published 20 May 2024 in cs.LG

Abstract: The rapid advancements of LLMs tightly associate with the expansion of the training data size. However, the unchecked ultra-large-scale training sets introduce a series of potential risks like data contamination, i.e. the benchmark data is used for training. In this work, we propose a holistic method named Polarized Augment Calibration (PAC) along with a new to-be-released dataset to detect the contaminated data and diminish the contamination effect. PAC extends the popular MIA (Membership Inference Attack) -- from machine learning community -- by forming a more global target at detecting training data to Clarify invisible training data. As a pioneering work, PAC is very much plug-and-play that can be integrated with most (if not all) current white- and black-box LLMs. By extensive experiments, PAC outperforms existing methods by at least 4.5%, towards data contamination detection on more 4 dataset formats, with more than 10 base LLMs. Besides, our application in real-world scenarios highlights the prominent presence of contamination and related issues.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Falcon-40B: an open large language model with state-of-the-art performance.
  3. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR.
  4. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. If you use this software, please cite it using these metadata.
  5. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901.
  6. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914. IEEE.
  7. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium (USENIX Security 19), pages 267–284.
  8. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650.
  9. Label-only membership inference attacks. In International conference on machine learning, pages 1964–1974. PMLR.
  10. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  11. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  12. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1286–1305.
  13. Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning, pages 5547–5569. PMLR.
  14. Shifting attention to relevance: Towards the uncertainty estimation of large language models. arXiv preprint arXiv:2307.01379.
  15. Jean-loup Gailly and Mark Adler. 2004. Zlib compression library.
  16. The pile: An 800gb dataset of diverse text for language modeling.
  17. A dataset and baselines for visual question answering on art. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 92–108. Springer.
  18. Shahriar Golchin and Mihai Surdeanu. 2023. Time travel in llms: Tracing data contamination in large language models. arXiv preprint arXiv:2308.08493.
  19. Toxigen: A large-scale machine-generated dataset for implicit and adversarial hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.
  20. Preventing verbatim memorization in language models gives a false sense of privacy. arXiv e-prints, pages arXiv–2210.
  21. Shotaro Ishihara. 2023. Training data extraction from pre-trained language models: A survey. arXiv preprint arXiv:2305.16157.
  22. Bargav Jayaraman and David Evans. 2019. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19), pages 1895–1912.
  23. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025.
  24. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
  25. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.
  26. Platypus: Quick, cheap, and powerful refinement of llms.
  27. Contextualized perturbation for textual adversarial attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5053–5069.
  28. Petro Liashchynskyi and Pavlo Liashchynskyi. 2019. Grid search, random search, genetic algorithm: A big comparison for nas. arXiv preprint arXiv:1912.06059.
  29. Thresholding classifiers to maximize f1 score. stat, 1050:14.
  30. Inbal Magar and Roy Schwartz. 2022. Data contamination: From memorization to exploitation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 157–165.
  31. Membership inference on word embedding and beyond. arXiv preprint arXiv:2106.11384.
  32. Membership inference attacks against language models via neighbourhood comparison. arXiv preprint arXiv:2305.18462.
  33. Quantifying privacy risks of masked language models using membership inference attacks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8332–8347.
  34. Detectgpt: Zero-shot machine-generated text detection using probability curvature.
  35. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 119–126.
  36. Arvind Narayanan and Sayash Kapoor. 2023. Gpt-4 and professional benchmarks: the wrong answer to the wrong question. AI Snake Oil, 20:2023.
  37. Scalable extraction of training data from (production) language models.
  38. Tight auditing of differentially private machine learning. arXiv preprint arXiv:2302.07956.
  39. Comprehensive privacy analysis of deep learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), pages 1–15.
  40. Machine learning with membership privacy using adversarial regularization. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 634–646.
  41. Adversary instantiation: Lower bounds for differentially private machine learning. In 2021 IEEE Symposium on security and privacy (SP), pages 866–882. IEEE.
  42. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  43. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  44. Language models are unsupervised multitask learners.
  45. Did chatgpt cheat on your test?
  46. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. In Network and Distributed Systems Security (NDSS) Symposium 2019.
  47. Detecting pretraining data from large language models. In NeurIPS 2023 Workshop on Regulatable ML.
  48. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE.
  49. Release strategies and the social impacts of language models.
  50. Congzheng Song and Vitaly Shmatikov. 2019. Auditing data provenance in text-generation models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 196–206.
  51. Privacy auditing with one (1) training run. arXiv preprint arXiv:2305.08846.
  52. Llama: Open and efficient foundation language models.
  53. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  54. Stablelm-3b-4e1t.
  55. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  56. On the importance of difficulty calibration in membership inference attacks. In International Conference on Learning Representations.
  57. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  58. Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6382–6388.
  59. " according to…" prompting language models improves quoting from pre-training data. arXiv preprint arXiv:2305.13252.
  60. Enhanced membership inference attacks against machine learning models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 3093–3106.
  61. Assessing hidden risks of llms: An empirical study on robustness, consistency, and credibility. arXiv preprint arXiv:2305.10235.
  62. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE.
  63. Analyzing information leakage of updates to natural language models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 363–375.
  64. Elias A Zerhouni and Elizabeth G Nabel. 2008. Protecting aggregate genomic data. Science, 322(5898):44–44.
  65. Opt: Open pre-trained transformer language models. arXiv e-prints, pages arXiv–2205.
  66. A survey of large language models. arXiv e-prints, pages arXiv–2303.
  67. Don’t make your llm an evaluation benchmark cheater. arXiv e-prints, pages arXiv–2311.
Citations (4)

Summary

We haven't generated a summary for this paper yet.