Knowledge Unlearning for Mitigating Privacy Risks in Language Models (2210.01504v2)

Published 4 Oct 2022 in cs.CL

Abstract: Pretrained LLMs (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for LLMs has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply performing gradient ascent on target token sequences is effective at forgetting them with little to no degradation of general LLMing performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and a decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being much more efficient and robust. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning.

Authors (7)

Joel Jang (30 papers)
Dongkeun Yoon (8 papers)
Sohee Yang (23 papers)
Sungmin Cha (26 papers)
Moontae Lee (54 papers)
Lajanugen Logeswaran (30 papers)
Minjoon Seo (82 papers)

Citations (142)

View on Semantic Scholar

Summary

Knowledge Unlearning for Mitigating Privacy Risks in LLMs

The paper under analysis presents a novel approach termed "knowledge unlearning," aimed at mitigating privacy risks inherent in Pretrained LLMs (LMs). Traditional methods for addressing privacy issues typically involve data preprocessing or differential privacy techniques, which necessitate retraining the model—an impractical solution considering the substantial resources required for large models. Instead, the authors propose an innovative post hoc method, employing knowledge unlearning to effectively diminish privacy risks without retraining the LM.

The methodology involves simply reversing the gradient of the target token sequences, an approach that leads to the forgetting of these sequences by the model. This mechanism retains, and sometimes even enhances, the model's general performance, especially in larger models. The unlearning process is shown to be more effective when executed sequentially across different data types rather than en masse, and its success seems to correlate with the specific domain of the data being unlearned.

Key experiments conducted on GPT-Neo models (ranging from 125M to 2.7B parameters) demonstrate that knowledge unlearning safeguards models against training data extraction attacks. This defense is measured via two novel metrics: Extraction Likelihood (EL) and Memorization Accuracy (MA). EL quantifies the likelihood of a model producing specific outputs from target sequences, taking into account varying lengths of prefixes, while MA assesses how much the model has memorized these sequences.

The research produces several consequential findings:

Empirical Comparisons: The paper showcases that knowledge unlearning provides stronger privacy guarantees than baselines that use either deduplication data preprocessing or differential privacy decoding. This is achieved with reduced computational costs and minimal performance degradation.
Privacy Guarantees: By employing knowledge unlearning, the models manage to reach a specified 'Forgetting Threshold' for both EL and MA metrics, indicating a state where sequences are considered forgotten.
Performance Analysis: Although knowledge unlearning results in a higher theoretical perplexity indicating a more uniform token distribution, practical performance improvements on NLP tasks suggest the retention of the most probable outputs is not adversely affected.
Sequential Unlearning Advantage: The paper reveals that sequentially unlearning data in smaller batches prevents significant performance downgrades and suggests a generalization effect where later chunks of data are forgotten more swiftly.
Domain-Dependent Unlearning: The ease of unlearning is contingent on the domain and structure of data. Structured data like code appear easier to forget than less structured narrative data.

The implications of this paper are profound for both practical applications and theoretical advancements in AI. Practically, it offers a computationally economical route to comply with privacy regulations post-hoc, omitting the expensive process of retraining models. Theoretically, it suggests a new dimension of understanding model memorization dynamics and proposes potential paths for further research in generating privacy-respecting LMs.

Future research should explore the broader application of knowledge unlearning across various architectures and domains, understanding the impacts of different types of data and attacks, and integrating these insights to develop robust, privacy-oriented public AI services. Additionally, investigating the interaction of knowledge unlearning with other privacy-preserving techniques could yield comprehensive frameworks for safeguarding sensitive information within AI models.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - joeljang/knowledge-unlearning: [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models (64 stars)