Evaluating Deep Unlearning in Large Language Models (2410.15153v3)

Published 19 Oct 2024 in cs.CL

Abstract: Machine unlearning is a key requirement of many data protection regulations such as GDPR. Prior work on unlearning has mostly considered superficial unlearning tasks where a single or a few related pieces of information are required to be removed. However, the task of unlearning a fact is much more challenging in recent LLMs, because the facts in LLMs can be deduced from each other. In this work, we investigate whether current unlearning methods for LLMs succeed beyond superficial unlearning of facts. Specifically, we formally propose a framework and a definition for deep unlearning facts that are interrelated. We design the metric, recall, to quantify the extent of deep unlearning. To systematically evaluate deep unlearning, we construct a synthetic dataset EDU-RELAT, which consists of a synthetic knowledge base of family relationships and biographies, together with a realistic logical rule set that connects them. We use this dataset to test four unlearning methods in four LLMs at different sizes. Our findings reveal that in the task of deep unlearning only a single fact, they either fail to properly unlearn with high recall, or end up unlearning many other irrelevant facts. Our dataset and code are publicly available at: https://github.com/wrh14/deep_unlearning.

Summary

The paper presents a novel framework, metric system, and synthetic dataset (EDU-RELAT) to evaluate the capacity of unlearning methods for deep unlearning in LLMs.
Experimental results show that while existing methods achieve superficial unlearning, deep unlearning is challenging, leading to significant accuracy loss (over 20%) or incomplete removal of interconnected facts.
This research indicates current unlearning methods are insufficient for data privacy compliance, emphasizing the need for developing advanced techniques that understand intricate factual relationships in LLMs.

Evaluating Deep Unlearning in LLMs

The paper presents an academic investigation into machine unlearning, a crucial component for adhering to data protection regulations such as the GDPR. The focus of this research is on "deep unlearning" within LLMs, a concept challenging the existing paradigms of data removal in AI systems. Unlike superficial unlearning, which only involves the removal of isolated or few related pieces of information, deep unlearning demands the elimination of a target fact as well as all other facts that can logically infer the target fact.

Summary of Methodology and Findings

The authors propose a novel testing framework alongside a metric system to evaluate the capacity of unlearning methods to succeed at deep unlearning within LLMs. They introduce a synthetic dataset, EDU-RELAT, as a benchmark to evaluate the deep unlearning capability. The dataset comprises synthetic biographical information and family relationships. Logical rules are imposed to test the capability of unlearning methods on deeply interconnected fact sets.

Four unlearning methods—Gradient Ascent, Negative Preference Optimization, Task Vector, and Who's Harry Potter—are tested on four distinct LLMs, including Phi-1.5, GPT2-XL, Llama2-7b, and Llama3-8b. Experimental results suggest that while most methods are effective at superficially unlearning the targeted information without significant loss of accuracy, achieving deep unlearning poses significant challenges. Key challenges include inadequate unlearning of facts that can lead to the deduction of the removed fact and excessive removal leading to additional loss of irrelevant factual data.

Results and Numerical Insights

The paper quantitatively establishes that the existing unlearning methods do not effectively ensure compliance with deep unlearning needs. Particularly, the loss of accuracy is measured to be greater than 20% when successful deep unlearning is achieved. Furthermore, the paper identifies a positive correlation between the size of the LLM and its ability to perform deep unlearning effectively. Larger models like Llama3-8b demonstrated a better trade-off between accuracy and recall compared to smaller models.

Implications and Future Directions

This research has important implications in domains where compliance with data privacy standards is critical. Given the insufficiencies of present unlearning methodologies, the paper marks the necessity for developing more advanced unlearning techniques that consider the intricate relational structure present in factual datasets.

Future work is advised to explore methodologies that incorporate a more sophisticated understanding of fact interrelations and deduction processes. Such work could include enhancing current algorithms with a focus on both black-box and white-box settings where the availability of rule sets and knowledge bases can be varied. This can substantially benefit AI domains, ensuring more robust unlearning mechanisms capable of handling complex data structures present within LLMs.

In conclusion, the paper emphasizes the limitations of current unlearning strategies and calls for a greater computational understanding of fact-based dependencies within LLMs, underscoring the potential for innovation in artificial intelligence unlearning processes.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/kamalikac/status/1863742547971149846

https://twitter.com/chhaviyadav_/status/1864093052488593541

https://twitter.com/chhaviyadav_/status/1864092907814453643