Deduplicating Training Data Mitigates Privacy Risks in Language Models (2202.06539v3)

Published 14 Feb 2022 in cs.CR, cs.CL, and cs.LG

Abstract: Past work has shown that LLMs are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. We first show that the rate at which LLMs regenerate training sequences is superlinearly related to a sequence's count in the training set. For instance, a sequence that is present 10 times in the training data is on average generated ~1000 times more often than a sequence that is present only once. We next show that existing methods for detecting memorized sequences have near-chance accuracy on non-duplicated training sequences. Finally, we find that after applying methods to deduplicate training data, LLMs are considerably more secure against these types of privacy attacks. Taken together, our results motivate an increased focus on deduplication in privacy-sensitive applications and a reevaluation of the practicality of existing privacy attacks.

Citations (233)

View on Semantic Scholar

Summary

The paper shows that duplicate training sequences are memorized disproportionately, with tenfold duplication leading to a nearly 1000-fold increase in output frequency.
It finds that deduplication reduces training data leakage by up to 20 times, effectively mitigating privacy vulnerabilities in language models.
The study also highlights that larger models and prolonged training amplify memorization, underscoring the need for robust, privacy-preserving data handling strategies.

Deduplicating Training Data Mitigates Privacy Risks in LLMs

The paper "Deduplicating Training Data Mitigates Privacy Risks in LLMs" by Kandpal et al. addresses an important vulnerability in LLMs (LMs): their susceptibility to privacy attacks due to the memorization of training data. The research sheds light on how duplication in training datasets significantly contributes to the success of these attacks, proposing deduplication as a method to enhance data privacy.

Key Findings

Superlinear Relationship in Regeneration: The authors show that the frequency of a sequence's appearance in the generated text is superlinearly related to its duplication in the training data. Specifically, a sequence duplicated 10 times in the dataset is generated approximately 1000 times more often than a sequence appearing only once. This implies that sequences with higher duplication are memorized disproportionately, posing a significant privacy risk.
Effectiveness of Membership Inference Attacks: The paper highlights that existing methods for detecting memorized sequences have low accuracy on non-duplicated data, suggesting that these methods exploit duplication rather than true memorization. For duplicated sequences, however, methods like the reference model achieve improved AUROC scores, indicating a better ability to detect duplicated content.
Impact of Deduplication: By applying deduplication methods to training data, the research confirms that LMs become significantly more resistant to privacy attacks. Deduplicated models emit approximately 20 times less training data. Notably, the reference model method still performs well even after deduplication, suggesting that it may capture other forms of memorization beyond simple duplication.
Training Dynamics and Model Size: Larger models and those trained for more epochs are shown to memorize training data to a greater extent, exacerbating the issue. Sampling methods also impact the regeneration of training sequences, with more restrictive sampling (e.g., lower k in top-k sampling) leading to greater regeneration rates.

Theoretical and Practical Implications

The research emphasizes the need for effective deduplication as a privacy-preserving measure in training regimes, especially given the tendency of modern LMs to train on vast web-scraped datasets. On a theoretical level, the findings prompt a re-evaluation of privacy attack models, suggesting that much of their success lies in leveraging duplicate sequences rather than inferring unique training samples.

Moreover, the paper underscores the necessity of revisiting the assumptions of memorization in LMs. The paper's novel observation of a superlinear pattern in regeneration rates invites further investigation into the memorization dynamics within deep learning models.

Future Directions

Future research is encouraged to explore broader definitions of duplication beyond exact matches, to include near-duplicates or semantically similar fragments, which could equally affect model memorization and privacy. Additionally, the authors propose examining how deduplication influences privacy in domains beyond text, such as images and audio, potentially uncovering universal patterns of data leakage across modalities.

Given the nuance introduced by deduplication, further development of privacy-preserving techniques and their integration with differential privacy guarantees and adversarial regularization remains a promising area of work. Future efforts might focus on adapting these insights to improve operational AI systems, ensuring robustness against privacy threats while maintaining model effectiveness across diverse applications.

PDF Markdown