Analyzing Unlearning via Model Merging in LLMs: Insights from the ZJUKLAB Approach
The paper, "ZJUKLAB at SemEval-2025. Task 4: Unlearning via Model Merging," presents a sophisticated system for unlearning sensitive data in LLMs, a crucial aspect of AI safety. The research describes the implementation of an unlearning system that employs the concept of model merging, notably through TIES-Merging, to address the limitations of current unlearning methodologies such as over-forgetting and under-forgetting.
Methodology and Results
The proposed unlearning system is noteworthy for its innovative two-phase approach: the Training Phase and the Merging Phase. During the Training Phase, two models with complementary biases are developed using Low-Rank Adaptation (LoRA). The optimization framework incorporates Negative Preference Optimization (NPO), Gradient Descent on Retain Set (GDR), and Kullback-Leibler Divergence Minimization on Retain Set (KLR). By applying distinct hyperparameters, two models with unique strengths are trained.
The subsequent Merging Phase utilizes TIES-Merging techniques to combine the LoRA adapters of the models. This phase includes operations such as Trimming, Electing, and Disjoint Merging to resolve parameter interferences and harmonize the strengths of both models. The merged model demonstrates a strong performance, achieving a Task Aggregate score of 0.944 and an overall Aggregate score of 0.487 during SemEval-2025, ranking second among 26 teams.
Experiments and Analysis
Table 1 of the paper compares the system's performance against others during online and local evaluations. The comparative analysis clearly shows that the system not only excels with high Task Aggregate scores but also achieves ideal MIA scores, validating the effectiveness of the merging strategy.
Furthermore, the paper provides a detailed investigation into the loss dynamics and weight perspectives during the training process. Regurgitation and Knowledge Scores were analyzed across different epochs, revealing trends in optimization and learning direction. Loss trajectories for the NPO+GDR+KLR model exhibited oscillations, indicative of complex interactions between forgetting and retention optimizations.
Limitations and Future Directions
The paper acknowledges certain limitations within its framework, particularly the phenomena of over-forgetting, where the model may inadvertently eliminate non-sensitive information or knowledge. Unique challenges such as repetitive character outputs and generic knowledge forgetting are identified, necessitating improvements in unlearning design and metrics. Moreover, the paper criticizes current evaluation metrics like ROUGE and MIA for their inadequacies in fully capturing unlearning success, due to their sensitivity to superficial textual variations.
In contemplating unlearning objectives, the authors argue for a streamlined focus on evaluation and policy implications rather than exhaustive unlearning dimensions like resilience to relearning attacks. Future directions emphasize the need for adopting robust, on-demand unlearning mechanisms.
Conclusion
This paper introduces a methodically sound approach to unlearning via model merging, providing crucial advancements in managing sensitive information within LLMs. While effective, it also calls for further exploration into evaluation methodologies and the establishment of practicable unlearning benchmarks. The insights gleaned from this paper lay a foundation for future discussions and developments in AI safety, particularly addressing pressing challenges in the privacy and copyright domain.