Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs (2406.11780v1)

Published 17 Jun 2024 in cs.LG, cs.AI, and cs.CL

Abstract: LLMs have shown to pose social and ethical risks such as generating toxic language or facilitating malicious use of hazardous knowledge. Machine unlearning is a promising approach to improve LLM safety by directly removing harmful behaviors and knowledge. In this paper, we propose "SPlit, UNlearn, MerGE" (SPUNGE), a framework that can be used with any unlearning method to amplify its effectiveness. SPUNGE leverages data attributes during unlearning by splitting unlearning data into subsets based on specific attribute values, unlearning each subset separately, and merging the unlearned models. We empirically demonstrate that SPUNGE significantly improves the performance of two recent unlearning methods on state-of-the-art LLMs while maintaining their general capabilities on standard academic benchmarks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Swanand Ravindra Kadhe (9 papers)
  2. Farhan Ahmed (12 papers)
  3. Dennis Wei (64 papers)
  4. Nathalie Baracaldo (34 papers)
  5. Inkit Padhi (31 papers)
Citations (3)