Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dissecting Fine-Tuning Unlearning in Large Language Models (2410.06606v2)

Published 9 Oct 2024 in cs.CL and cs.LG

Abstract: Fine-tuning-based unlearning methods prevail for preventing targeted harmful, sensitive, or copyrighted information within LLMs while preserving overall capabilities. However, the true effectiveness of these methods is unclear. In this work, we delve into the limitations of fine-tuning-based unlearning through activation patching and parameter restoration experiments. Our findings reveal that these methods alter the model's knowledge retrieval process, providing further evidence that they do not genuinely erase the problematic knowledge embedded in the model parameters. Instead, the coefficients generated by the MLP components in the model's final layer are the primary contributors to these seemingly positive unlearning effects, playing a crucial role in controlling the model's behaviors. Furthermore, behavioral tests demonstrate that this unlearning mechanism inevitably impacts the global behavior of the models, affecting unrelated knowledge or capabilities. The code is released at https://github.com/yihuaihong/Dissecting-FT-Unlearning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yihuai Hong (6 papers)
  2. Yuelin Zou (5 papers)
  3. Lijie Hu (50 papers)
  4. Ziqian Zeng (32 papers)
  5. Di Wang (407 papers)
  6. Haiqin Yang (32 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com