Papers
Topics
Authors
Recent
2000 character limit reached

ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging (2503.21088v2)

Published 27 Mar 2025 in cs.CL, cs.AI, cs.CV, cs.LG, and cs.MM

Abstract: This paper presents the ZJUKLAB team's submission for SemEval-2025 Task 4: Unlearning Sensitive Content from LLMs. This task aims to selectively erase sensitive knowledge from LLMs, avoiding both over-forgetting and under-forgetting issues. We propose an unlearning system that leverages Model Merging (specifically TIES-Merging), combining two specialized models into a more balanced unlearned model. Our system achieves competitive results, ranking second among 26 teams, with an online score of 0.944 for Task Aggregate and 0.487 for overall Aggregate. In this paper, we also conduct local experiments and perform a comprehensive analysis of the unlearning process, examining performance trajectories, loss dynamics, and weight perspectives, along with several supplementary experiments, to understand the effectiveness of our method. Furthermore, we analyze the shortcomings of our method and evaluation metrics, emphasizing that MIA scores and ROUGE-based metrics alone are insufficient to fully evaluate successful unlearning. Finally, we emphasize the need for more comprehensive evaluation methodologies and rethinking of unlearning objectives in future research. Code is available at https://github.com/zjunlp/unlearn/tree/main/semeval25.

Summary

Analyzing Unlearning via Model Merging in LLMs: Insights from the ZJUKLAB Approach

The paper, "ZJUKLAB at SemEval-2025. Task 4: Unlearning via Model Merging," presents a sophisticated system for unlearning sensitive data in LLMs, a crucial aspect of AI safety. The research describes the implementation of an unlearning system that employs the concept of model merging, notably through TIES-Merging, to address the limitations of current unlearning methodologies such as over-forgetting and under-forgetting.

Methodology and Results

The proposed unlearning system is noteworthy for its innovative two-phase approach: the Training Phase and the Merging Phase. During the Training Phase, two models with complementary biases are developed using Low-Rank Adaptation (LoRA). The optimization framework incorporates Negative Preference Optimization (NPO), Gradient Descent on Retain Set (GDR), and Kullback-Leibler Divergence Minimization on Retain Set (KLR). By applying distinct hyperparameters, two models with unique strengths are trained.

The subsequent Merging Phase utilizes TIES-Merging techniques to combine the LoRA adapters of the models. This phase includes operations such as Trimming, Electing, and Disjoint Merging to resolve parameter interferences and harmonize the strengths of both models. The merged model demonstrates a strong performance, achieving a Task Aggregate score of 0.944 and an overall Aggregate score of 0.487 during SemEval-2025, ranking second among 26 teams.

Experiments and Analysis

Table 1 of the paper compares the system's performance against others during online and local evaluations. The comparative analysis clearly shows that the system not only excels with high Task Aggregate scores but also achieves ideal MIA scores, validating the effectiveness of the merging strategy.

Furthermore, the paper provides a detailed investigation into the loss dynamics and weight perspectives during the training process. Regurgitation and Knowledge Scores were analyzed across different epochs, revealing trends in optimization and learning direction. Loss trajectories for the NPO+GDR+KLR model exhibited oscillations, indicative of complex interactions between forgetting and retention optimizations.

Limitations and Future Directions

The paper acknowledges certain limitations within its framework, particularly the phenomena of over-forgetting, where the model may inadvertently eliminate non-sensitive information or knowledge. Unique challenges such as repetitive character outputs and generic knowledge forgetting are identified, necessitating improvements in unlearning design and metrics. Moreover, the paper criticizes current evaluation metrics like ROUGE and MIA for their inadequacies in fully capturing unlearning success, due to their sensitivity to superficial textual variations.

In contemplating unlearning objectives, the authors argue for a streamlined focus on evaluation and policy implications rather than exhaustive unlearning dimensions like resilience to relearning attacks. Future directions emphasize the need for adopting robust, on-demand unlearning mechanisms.

Conclusion

This paper introduces a methodically sound approach to unlearning via model merging, providing crucial advancements in managing sensitive information within LLMs. While effective, it also calls for further exploration into evaluation methodologies and the establishment of practicable unlearning benchmarks. The insights gleaned from this paper lay a foundation for future discussions and developments in AI safety, particularly addressing pressing challenges in the privacy and copyright domain.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 25 likes about this paper.

HackerNews

  1. Unlearning via Model Merging (3 points, 0 comments)