Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models (2310.15007v2)

Published 23 Oct 2023 in cs.CL, cs.CR, and cs.LG

Abstract: With LLMs poised to become embedded in our daily lives, questions are starting to be raised about the data they learned from. These questions range from potential bias or misinformation LLMs could retain from their training data to questions of copyright and fair use of human-generated text. However, while these questions emerge, developers of the recent state-of-the-art LLMs become increasingly reluctant to disclose details on their training corpus. We here introduce the task of document-level membership inference for real-world LLMs, i.e. inferring whether the LLM has seen a given document during training or not. First, we propose a procedure for the development and evaluation of document-level membership inference for LLMs by leveraging commonly used data sources for training and the model release date. We then propose a practical, black-box method to predict document-level membership and instantiate it on OpenLLaMA-7B with both books and academic papers. We show our methodology to perform very well, reaching an AUC of 0.856 for books and 0.678 for papers. We then show our approach to outperform the sentence-level membership inference attacks used in the privacy literature for the document-level membership task. We further evaluate whether smaller models might be less sensitive to document-level inference and show OpenLLaMA-3B to be approximately as sensitive as OpenLLaMA-7B to our approach. Finally, we consider two mitigation strategies and find the AUC to slowly decrease when only partial documents are considered but to remain fairly high when the model precision is reduced. Taken together, our results show that accurate document-level membership can be inferred for LLMs, increasing the transparency of technology poised to change our lives.

Citations (18)

View on Semantic Scholar

Summary

The paper introduces a novel black-box approach for document-level membership inference, achieving an AUC of 0.856 for books and 0.678 for papers.
It aggregates token-level predictions into a document-level meta-classifier, effectively tracing even minimal training data presence.
The findings highlight critical implications for AI transparency and training practices by demonstrating identifiable document memorization in LLMs.

Document-level Membership Inference for LLMs

Introduction to Document-level Membership Inference

The proliferation of LLMs has raised questions about the datasets these models are trained on, ranging from concerns about bias and misinformation to copyright infringements. Addressing these questions necessitates a method to determine whether a specific document was part of an LLM's training dataset. This research introduces a novel approach to document-level membership inference, aimed at uncovering whether an LLM has been trained with a particular document. A practical, black-box method tailored for this purpose was developed and evaluated using OpenLLaMA-7B, focusing on books and academic papers.

Methodology Overview

A document-level membership inference strategy was developed by first generating a meta-classifier capable of predicting whether a document was in the training dataset. The process involved querying the model for token-level predictions, normalizing these predictions, aggregating them to a document level and applying a meta-classifier. An impressive Area Under the Curve (AUC) of 0.856 for books and 0.678 for papers was achieved, demonstrating the efficacy of this approach. Notably, this method outperforms sentence-level membership inference attacks, which are commonly referenced in privacy literature for similar tasks.

Practical Implications and Theoretical Contributions

The findings reveal that even documents that comprise a minuscule fraction of the training data can be traced in the LLM's knowledge base. This implies a significant level of memorization by LLMs, challenging the presumption that the vast data throughput of these models precludes the retention of identifiable, individual document information. Practically, this methodology can serve as an auditing tool for content creators, organizations, and regulators to ascertain whether a piece of content was used in training an LLM. Theoretically, it underscores the nuanced understanding required regarding how LLMs process, retain, and replicate training data inputs.

Speculations on Future Developments

The ability to perform document-level membership inference may influence the development and training practices of future LLMs. Developers might need to adopt more transparent practices regarding their training datasets or implement techniques to mitigate undesired memorization. Furthermore, this research could spur the development of more sophisticated inference methods and encourage a reevaluation of privacy and copyright considerations in generative AI.

Conclusion

This paper constitutes a significant step toward transparent AI practices by enabling an understanding of what content LLMs are trained on. By achieving accurate document-level membership inference, it opens the door to greater scrutiny and accountability in the development of AI technologies that are becoming increasingly integrated into our daily lives. The methodologies and findings not only offer a practical tool for various stakeholders but also enrich the discourse on the implications of training data usage in LLMs.

PDF Markdown

Related Papers

Tweets

https://twitter.com/mattwichrowski/status/1751944655296438351