Extracting Memorized Training Data via Decomposition (2409.12367v2)

Published 18 Sep 2024 in cs.LG, cs.AI, and cs.CR

Abstract: The widespread use of LLMs in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. Our analysis demonstrates that this method successfully induces the LLM to generate texts that are reliable reproductions of news articles, meaning that they likely originate from the source training dataset. This method is simple, generalizable, and does not fine-tune or change the production model. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities, including privacy risks and unauthorized data leaks. These implications require careful consideration from model development to its end-use.

Authors (7)

Ellen Su (2 papers)
Anu Vellore (2 papers)
Amy Chang (4 papers)
Raffaele Mura (3 papers)
Blaine Nelson (10 papers)
Paul Kassianik (7 papers)
Amin Karbasi (116 papers)

Summary

Extracting Memorized Training Data via Decomposition

The paper "Extracting Memorized Training Data via Decomposition" explores a critical vulnerability of LLMs, focusing on their propensity to inadvertently disclose sensitive training data. The researchers propose a non-adversarial, query-based decompositional methodology to systematically extract memorized information from LLMs.

Summary of Research

Introduction and Background

LLMs, such as GPT-3 and others, demonstrate remarkable capabilities in generating human-like text by leveraging extensive training data. However, their ability to memorize and potentially regurgitate training examples poses significant security, privacy, and ethical concerns. Previous research indicates that through sophisticated prompts or adversarial techniques, one can coax these models into reproducing parts of their training data. This paper explores the exploitation of instruction decomposition—an incremental querying technique that can piece together fragments of memorized text.

Methodology

The authors designed a detailed experimental framework involving two state-of-the-art LLMs, querying them with distilled news articles from the New York Times (NYT) and the Wall Street Journal (WSJ). They employed a multi-round approach:

Initial Screening: All articles were queried using a basic prompt structure.
Refinement Phase: Articles yielding promising results were subjected to enhanced prompting techniques using in-context learning (ICL) methods.
Deep Extraction: A subset of highly responsive articles underwent multi-turn prompting, aiming to reconstruct complete sections of the articles.

The primary metric for success was the retrieval of verbatim sentences or substantial matches from the original articles.

Results

The experiment exhibited concerning results:

From 3,723 NYT articles, at least one verbatim sentence was recovered from 73 articles, with six articles having over 20% of their content extracted.
For 1,349 WSJ articles, seven articles yielded at least one verbatim sentence.

The results indicate that even without adversarial intent, LLMs can be systematically queried to extract sensitive information, shedding light on a significant privacy concern inherent to current generative models.

Detailed Findings

Improved Query Techniques

The paper highlighted that multi-turn and ICL-based prompts significantly enhance data extraction capabilities. The refined methods revealed:

Simplified Prompts: Basic queries were often sufficient to extract fragments of data.
Enhanced ICL Prompts: These prompts demonstrated slight improvements by conditioning the model more effectively.
Multi-Turn Prompts: Crucial for piecing together longer sequences of text by preserving the context across queries.

Practical and Theoretical Implications

Practical Security: This methodology, if scaled, could expose a substantial volume of training data, making it crucial for organizations to reassess the use and safeguarding of LLMs in contexts involving sensitive information.
Legal and Ethical Concerns: Reproducing copyrighted materials or sensitive data directly from models imposes legal liabilities and ethical dilemmas, necessitating more robust data handling and anonymization techniques within training corpora.
Model Safeguarding: Developers must prioritize enhancing LLM architectures to mitigate such vulnerabilities. Techniques may include differentially private training methods, improved data sanitization, and more sophisticated scrutiny of the training datasets.

Future Directions

The authors suggest several avenues for future research:

Benchmarking Compositional Learning: Extending this analysis to multiple LLM architectures to establish benchmarks.
Defense Mechanisms: Investigating adversarial robustness techniques specifically targeting compositional queries.
Privacy-preserving Training: Integrating and evaluating advanced techniques such as federated learning or homomorphic encryption during the model training phase to enforce data confidentiality.

Conclusion

The presented research elucidates a critical and practical vulnerability of LLMs in replicating memorized data fragments under benign query settings. This work underscores the necessity for heightened efforts within the machine learning community to develop, test, and integrate robust privacy-preserving methodologies to mitigate the risks associated with data leakage from generative models. The findings also implicate enhanced governance and regulatory frameworks to address the burgeoning ethical concerns in the deployment of LLMs across various domains.

PDF Markdown

Related Papers

Tweets

https://twitter.com/aminkarbasi/status/1837355442206318890

https://twitter.com/aminkarbasi/status/1866255754602406175

https://twitter.com/AI__TECH/status/1839187670033117609

HackerNews

Extracting Memorized Training Data via Decomposition (2 points, 0 comments)