ReLearn: Unlearning via Learning for Large Language Models (2502.11190v3)

Published 16 Feb 2025 in cs.CL, cs.AI, cs.CV, cs.HC, and cs.LG

Abstract: Current unlearning methods for LLMs usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.

Authors (10)

Haoming Xu (6 papers)
Ningyuan Zhao (2 papers)
Liming Yang (28 papers)
Sendong Zhao (31 papers)
Shumin Deng (65 papers)
Mengru Wang (16 papers)
Bryan Hooi (159 papers)
Nay Oo (10 papers)
Huajun Chen (198 papers)
Ningyu Zhang (148 papers)

Summary

Evaluating ReLearn: An Unlearning Framework for LLMs

The paper "ReLearn: Unlearning via Learning for LLMs" presents a new methodology for effectively editing LLMs to forget specific information. This approach, termed ReLearn, addresses a critical challenge in AI ethics—removing unauthorized or private information from large-scale machine learning models without compromising their generative capabilities.

The current unlearning strategies often rely on reverse optimization techniques such as Gradient Ascent (GA) and Negative Preference Optimization (NPO). These approaches, while effective in diminishing the presence of targeted information, frequently degrade the model's capacity for language generation, leading to non-coherent outputs. This paper identifies that such methodologies inadvertently penalize subsequent token predictions, resulting in what's termed as a "probability seesaw effect"—an unpredictable oscillation in token probabilities that can impair linguistic fluency and coherence.

To counteract these issues, ReLearn adopts a data augmentation approach combined with fine-tuning. This method involves generating synthetic data that maintains model performance by training the system to replace sensitive information with authorized content. The authors implement a sophisticated evaluation framework that introduces three metrics: Knowledge Forgetting Rate (KFR), Knowledge Retention Rate (KRR), and Linguistic Score (LS). KFR measures the success of information removal, KRR ensures important retained information remains intact, and LS assesses the quality of the model's language generation post-unlearning.

Remarkably, the authors demonstrate that ReLearn is capable of achieving a balance where models exhibit high KFR—in some cases up to 85%—while preserving an effective KRR, sustaining the performance close to models that have not undergone unlearning procedures. This balance highlights the framework's ability to retain relevant information and linguistic quality, effectively mitigating the typical shortcomings of other unlearning techniques.

In theoretical terms, ReLearn suggests a paradigm shift from disrupting token predictions through reverse operations to enhancing model adaptability via augmentation and positive optimization. This seesaw mitigation preserves the model's language capacities by aligning it closer with human cognitive flexibility, adapting new knowledge while appropriately discarding specific instructions.

Practically, this research has significant implications for regulatory compliance in AI systems, especially those that encompass sensitive datasets. By ensuring models can "forget" unauthorized knowledge effectively, ReLearn fortifies AI against legal challenges, like those seen in legislative actions concerning data privacy and copyright infringement.

The authors also underline the ongoing challenges of such an approach, noting the computational overhead of synthesizing augmented datasets and the need for finer-tuned metrics that can better capture nuanced conceptual forgetfulness. This indicates future research directions in optimizing data synthesis processes and refining evaluation standards to further enhance the framework's scalability and applicability.

Overall, ReLearn provides a promising path toward practical and ethical AI management, emphasizing the structure of unlearning that harmonizes with model robustness and performance integrity. This advancement bridges a crucial gap in machine learning applications where regulatory and ethical constraints are paramount, fostering safer and more compliant AI technologies.