Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models (2407.01920v2)

Published 2 Jul 2024 in cs.CL, cs.AI, cs.CV, cs.LG, and cs.MM

Abstract: LLMs trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material. Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge. However, current unlearning paradigms are mired in vague forgetting boundaries, often erasing knowledge indiscriminately. In this work, we introduce KnowUnDo, a benchmark containing copyrighted content and user privacy domains to evaluate if the unlearning process inadvertently erases essential knowledge. Our findings indicate that existing unlearning methods often suffer from excessive unlearning. To address this, we propose a simple yet effective method, MemFlex, which utilizes gradient information to precisely target and unlearn sensitive parameters. Experimental results show that MemFlex is superior to existing methods in both precise knowledge unlearning and general knowledge retaining of LLMs. Code and dataset are released at https://github.com/zjunlp/KnowUnDo.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Qwen technical report. arXiv preprint arXiv:2309.16609.
  2. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  3. Francis Beeson. 2024. llm-benchmarks.
  4. California. 2018. California consumer privacy act (ccpa). https://oag.ca.gov/privacy/ccpa.
  5. Open problems and fundamental limitations of reinforcement learning from human feedback. Preprint, arXiv:2307.15217.
  6. Learning to unlearn: Instance-wise unlearning for pre-trained classifiers. abs/2301.11578.
  7. Jiaao Chen and Diyi Yang. 2023. Unlearn what you want to forget: Efficient unlearning for llms. arXiv preprint arXiv:2310.20150.
  8. Machine unlearning in large language models. Preprint, arXiv:2404.16841.
  9. Knowledge localization: Mission not accomplished? enter query localization! Preprint, arXiv:2405.14117.
  10. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457.
  11. OpenCompass Contributors. 2023. Opencompass: A universal evaluation platform for foundation models. https://github.com/open-compass/opencompass.
  12. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, Dublin, Ireland. Association for Computational Linguistics.
  13. Ronen Eldan and Mark Russinovich. 2023. Who’s harry potter? approximate unlearning in llms. arXiv:2310.02238.
  14. Europe. 2016. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (general data protection regulation). https://eur-lex.europa.eu/eli/reg/2016/679/oj.
  15. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. CoRR, abs/2310.12508.
  16. Patricia Farrell. 2022. Forgetting is our brain’s pathway to maintaining natural mental health. Medika Life.
  17. A framework for few-shot language model evaluation.
  18. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  19. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In CVPR, pages 9301–9309.
  20. Machine unlearning in large language models. Preprint, arXiv:2405.15152.
  21. Measuring massive multitask language understanding. In ICLR.
  22. Intrinsic evaluation of unlearning using parametric knowledge traces. Preprint, arXiv:2406.11614.
  23. Lora: Low-rank adaptation of large language models. CoRR, abs/2106.09685.
  24. Offset unlearning for large language models. Preprint, arXiv:2404.11045.
  25. Knowledge unlearning for mitigating privacy risks in language models. In ACL, pages 14389–14408.
  26. Xisen Jin and Xiang Ren. 2024. Demystifying forgetting in language model fine-tuning with statistical analysis of example associations. Preprint, arXiv:2406.14026.
  27. Rwku: Benchmarking real-world knowledge unlearning for large language models. Preprint, arXiv:2406.10890.
  28. RACE: large-scale reading comprehension dataset from examinations. CoRR, abs/1704.04683.
  29. Single image unlearning: Efficient machine unlearning in multimodal large language models. Preprint, arXiv:2405.12523.
  30. The wmdp benchmark: Measuring and reducing malicious use with unlearning. arXiv preprint arXiv:2403.03218.
  31. Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 3214–3252. Association for Computational Linguistics.
  32. Ken Ziyu Liu. 2024. Machine unlearning in 2024.
  33. Dora: Weight-decomposed low-rank adaptation. Preprint, arXiv:2402.09353.
  34. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  35. Towards safer large language models through machine unlearning. arXiv preprint arXiv:2402.10058.
  36. Eraser: Jailbreaking defense in large language models via unlearning harmful knowledge. Preprint, arXiv:2404.05880.
  37. Tofu: A task of fictitious unlearning for llms. arXiv preprint arXiv:2401.06121.
  38. Locating and editing factual associations in GPT. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  39. Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  40. Fast model editing at scale. In ICLR.
  41. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  42. Training language models to follow instructions with human feedback. In NeurIPS.
  43. In-context unlearning: Language models as few shot unlearners. arXiv preprint arXiv:2310.07579.
  44. Nicholas Pochinkov and Nandi Schoots. 2024. Dissecting language models: Machine unlearning via selective pruning. Preprint, arXiv:2403.01267.
  45. Forgetting: preliminary considerations. In Forgetting, pages 15–36. Psychology Press.
  46. Socialiqa: Commonsense reasoning about social interactions. CoRR, abs/1904.09728.
  47. "forgetting" in machine learning and beyond: A survey. Preprint, arXiv:2405.20620.
  48. David Sharek and Eric Wiebe. 2011. Using flow theory to design video games as experimental stimuli. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 55(1):1520–1524.
  49. Knowledge unlearning for llms: Tasks, methods, and challenges. Preprint, arXiv:2311.15766.
  50. Scott A. Small. 2021. Why forgetting is good for your memory. Columbia University Department of Psychiatry.
  51. Localizing paragraph memorization in language models. Preprint, arXiv:2403.19851.
  52. Benjamin C. Storm. 2011. The benefit of forgetting in thinking and remembering. Current Directions in Psychological Science, 20(5):291–295.
  53. Massive editing for large language models via meta learning. arXiv, 2311.04661.
  54. Guardrail baselines for unlearning in llms. arXiv preprint arXiv:2403.03329.
  55. Instructedit: Instruction-based knowledge editing for large language models. CoRR, abs/2402.16123.
  56. Llama 2: Open foundation and fine-tuned chat models. Preprint, arXiv:2307.09288.
  57. U.S. 2018. United states code (usc). https://uscode.house.gov/browse.xhtml.
  58. Enhancing data privacy in large language models through private association editing. Preprint, arXiv:2406.18221.
  59. Rkld: Reverse kl-divergence-based knowledge distillation for unlearning personal information in large language models. Preprint, arXiv:2406.01983.
  60. Towards efficient and effective unlearning of large language models for recommendation. Preprint, arXiv:2403.03536.
  61. Kga: A general machine unlearning framework based on knowledge gap alignment. arXiv preprint arXiv:2305.06535.
  62. Detoxifying large language models via knowledge editing. CoRR, abs/2403.14472.
  63. Machine unlearning: A comprehensive survey. Preprint, arXiv:2405.07406.
  64. Large scale knowledge washing. Preprint, arXiv:2405.16720.
  65. Exploring activation patterns of parameters in language models. Preprint, arXiv:2405.17799.
  66. Machine unlearning of pre-trained large language models. CoRR, abs/2402.15159.
  67. Large language model unlearning. CoRR, abs/2310.10683.
  68. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 10222–10240. Association for Computational Linguistics.
  69. Unlearning bias in language models by partitioning gradients. In Proc. The 61st Annual Meeting of the Association for Computational Linguistics (ACL2023) Findings.
  70. A comprehensive study of knowledge editing for large language models. Preprint, arXiv:2401.01286.
  71. Negative preference optimization: From catastrophic collapse to effective unlearning. arXiv preprint arXiv:2404.05868.
  72. What makes unlearning hard and what to do about it. Preprint, arXiv:2406.01257.
  73. A survey of large language models. CoRR, abs/2303.18223.
  74. Towards comprehensive and efficient post safety alignment of large language models via safety patching. Preprint, arXiv:2405.13820.
  75. Deciphering the impact of pretraining data on large language models through machine unlearning. Preprint, arXiv:2402.11537.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Bozhong Tian (13 papers)
  2. Xiaozhuan Liang (14 papers)
  3. Siyuan Cheng (41 papers)
  4. Qingbin Liu (13 papers)
  5. Mengru Wang (16 papers)
  6. Dianbo Sui (19 papers)
  7. Xi Chen (1035 papers)
  8. Huajun Chen (198 papers)
  9. Ningyu Zhang (148 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com