DE-COP: Detecting Copyrighted Content in Language Models Training Data (2402.09910v2)
Abstract: How can we detect if copyrighted content was used in the training process of a LLM, considering that the training data is typically undisclosed? We are motivated by the premise that a LLM is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The code and datasets are available at https://github.com/LeiLiLab/DE-COP.
- Anthropic. Claude 2. https://www.anthropic.com/news/claude-2, 2023. Accessed: 2023-11-07.
- Brittain, B. Artists take new shot at Stability, Midjourney in updated copyright lawsuit. Reuters, 2023. URL https://www.reuters.com/legal/litigation/artists-take-new-shot-stability-midjourney-updated-copyright-lawsuit-2023-11-30/.
- Language Models are Few-Shot Learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates, Inc., 2020.
- Extracting Training Data from Large Language Models. In USENIX Security Symposium, 2020.
- Membership Inference Attacks From First Principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914, Los Alamitos, CA, USA, may 2022a. IEEE Computer Society. doi: 10.1109/SP46214.2022.9833649.
- Quantifying Memorization Across Neural Language Models. ArXiv, abs/2202.07646, 2022b.
- Speak, memory: An archaeology of books known to chatgpt/gpt-4. arXiv preprint arXiv:2305.00118, 2023.
- Can Copyright be Reduced to Privacy? arXiv preprint arXiv:2305.14822, 2023.
- Feldman, V. Does Learning Require Memorization? A Short Tale about a Long Tail. arXiv preprint arXiv:1906.05271, 2021.
- The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027, 2020.
- The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work. The New York Times, 2023. URL https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html.
- Mistral 7B. arXiv preprint arXiv:2310.06825, 2023.
- Mixtral of Experts. arXiv preprint arXiv:2401.04088, 2024.
- Copyright Violations and Large Language Models. In Bouamor, H., Pino, J., and Bali, K. (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 7403–7412, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.458.
- LLM360: Towards Fully Transparent Open-Source LLMs. arXiv preprint arXiv:2312.06550, 2023.
- Understanding Membership Inferences on Well-Generalized Learning Models. arXiv preprint arXiv:1802.04889, 2018.
- Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. In Goldberg, Y., Kozareva, Z., and Zhang, Y. (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 8332–8347, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.570.
- Scalable Extraction of Training Data from (Production) Language Models. arXiv preprint arXiv:2311.17035, 2023.
- OpenAI. Introducing Chat-GPT. https://openai.com/blog/chatgpt, 2022. Accessed: 2022-11-30.
- OpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774, 2023.
- Proving Test Set Contamination in Black Box Language Models. arXiv preprint arXiv:2310.17623, 2023.
- Language Models are Unsupervised Multitask Learners. 2019.
- Detecting Pretraining Data from Large Language Models. arXiv preprint arXiv:2310.16789, 2023.
- Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18, Los Alamitos, CA, USA, may 2017. IEEE Computer Society. doi: 10.1109/SP.2017.41.
- The Authors Guild. AG Recommends Clause in Publishing and Distribution Agreements Prohibiting AI Training Uses. https://authorsguild.org/news/model-clause-prohibiting-ai-training/, 2023. Accessed: 2023-03-01.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint arXiv:2307.09288, 2023.
- On Provable Copyright Protection for Generative Models. arXiv preprint arXiv:2302.10870, 2023.
- On the Importance of Difficulty Calibration in Membership Inference Attacks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Counterfactual Memorization in Neural Language Models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
- Investigating Copyright Issues of Diffusion Models under Practical Scenarios. arXiv preprint arXiv:2311.12803, 2023b.
- Provably confidential language modelling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 943–955, 2022.
- Large Language Models Are Not Robust Multiple Choice Selectors. In The Twelfth International Conference on Learning Representations, 2024.