MUSE: Machine Unlearning Six-Way Evaluation for Language Models (2407.06460v2)

Published 8 Jul 2024 in cs.CL and cs.AI

Abstract: LLMs (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. Furthermore, existing algorithms fail to meet deployer's expectations because they often degrade general model utility and also cannot sustainably accommodate successive unlearning requests or large-scale content removal. Our findings identify key issues with the practicality of existing unlearning algorithms on LLMs, and we release our benchmark to facilitate further evaluations: muse-bench.github.io

PDF HTML Abstract

Machine Unlearning Six-Way Evaluation for LLMs

In the domain of LLMing, the management of large training datasets that potentially include private or copyrighted material has become critical. This academic paper, authored by Weijia Shi et al., introduces a systematic benchmark named MUSE (Machine Unlearning Six-Way Evaluation) to address unlearning efficacy in LLMs. This paper responds to the gaps in existing unlearning algorithms, which often lack comprehensive assessment and therefore fail to meet the multifaceted demands of both data owners and model deployers.

Contributions and Evaluation Criteria

The paper proposes a robust framework MUSE that evaluates unlearning algorithms based on six distinct criteria:

No Verbatim Memorization: Preventing the model from reproducing exact sequences present in the data intended for unlearning.
No Knowledge Memorization: Ensuring the model does not retain factual knowledge from the unlearned data.
No Privacy Leakage: Protecting against the inference of whether a specific piece of data was part of the training set.
Utility Preservation: Maintaining model performance on data not targeted for unlearning.
Scalability: Effectively handling varying sizes of data removal requests.
Sustainability: Accommodating multiple sequential unlearning requests without degradation in model performance.

Methodology

The authors evaluate eight unlearning algorithms across these six criteria. The methods leveraged include:

Gradient Ascent (GA): Directly maximizes the loss on the forget set.
Negative Preference Optimization (NPO): Treating the forget set as a negative preference to modulate model behavior.
Task Vectors: Weight manipulations based on differential training.
Who’s Harry Potter (WHP): Interpolating between the original and a reinforced model to achieve unlearning.

Regularization techniques such as Gradient Descent on the Retain Set (GDR) and KL Divergence Minimization (KLR) were also employed to mitigate utility loss in the retain set.

Evaluation and Results

The unlearning efficacy of the methods was tested on two datasets: BBC news articles and the Harry Potter book series. The evaluation demonstrates that while most methods effectively address verbatim and knowledge memorization, they significantly compromise utility preservation and fail to prevent privacy leakage. Specifically, the results highlight:

Effectiveness in Memorization Removal: Methods like GA and NPO, when combined with regularizers, significantly reduce verbatim and knowledge retention.
Utility Degradation: The unlearned models frequently suffer a notable drop in utility, contradicting the deployers' need for sustainable practical deployment.
Privacy Leakage: The majority of unlearning algorithms fail to prevent privacy leakage, either under-unlearning or over-unlearning the data, compromising the model's security integrity.

Practical Implications and Future Directions

The findings emphasize crucial issues with current unlearning methods, specifically in their inability to balance the requirements of data owners and deployers. The degradation in model utility and privacy leakage highlights the insufficiency of simple optimization strategies for effective unlearning.

Theoretical implications include the need for novel algorithmic frameworks that can robustly address all six criteria. Practically, this could mean developing methods that better estimate the distributional impacts of unlearning operations or designing architecture-agnostic approaches that generalize well across model types and sizes.

Moving forward, research can benefit from:

Enhanced Regularization Techniques: Inventing more sophisticated regularizers that can preserve model utility while effectively unlearning data.
Robust Evaluation Metrics: Creating more granular evaluation criteria that account for diverse application scenarios and data types.
Privacy-Guaranteeing Algorithms: Integrating differential privacy mechanisms directly into unlearning algorithms to ensure no privacy leakage.

The release of the MUSE benchmark provides a valuable tool for the field, enabling consistent and comprehensive evaluations of future unlearning algorithms.

Conclusion

This paper markedly advances machine unlearning by proposing a detailed, multi-dimensional evaluation framework and providing empirical evidence of the limitations in current methodologies. By rigorously assessing both theoretical and practical factors, the paper points the way toward more sophisticated and dependable unlearning techniques in machine learning applications. The MUSE benchmark is positioned to become a pivotal resource facilitating the ongoing development of robust unlearning solutions.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Weijia Shi (55 papers)
Jaechan Lee (3 papers)
Yangsibo Huang (40 papers)
Sadhika Malladi (17 papers)
Jieyu Zhao (54 papers)
Ari Holtzman (39 papers)
Daogao Liu (34 papers)
Luke Zettlemoyer (225 papers)
Noah A. Smith (224 papers)
Chiyuan Zhang (57 papers)

Citations (19)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/WeijiaShi2/status/1812868552162255090

https://twitter.com/WeijiaShi2/status/1901730526702252264

https://twitter.com/_vztu/status/1811158648451912075

https://twitter.com/SadhikaMalladi/status/1914395718279717181

https://twitter.com/GptMaestro/status/1813709322020602243