Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Machine Unlearning in Generative AI: A Survey (2407.20516v1)

Published 30 Jul 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Generative AI technologies have been deployed in many places, such as (multimodal) LLMs and vision generative models. Their remarkable performance should be attributed to massive training data and emergent reasoning abilities. However, the models would memorize and generate sensitive, biased, or dangerous information originated from the training data especially those from web crawl. New machine unlearning (MU) techniques are being developed to reduce or eliminate undesirable knowledge and its effects from the models, because those that were designed for traditional classification tasks could not be applied for Generative AI. We offer a comprehensive survey on many things about MU in Generative AI, such as a new problem formulation, evaluation methods, and a structured discussion on the advantages and limitations of different kinds of MU techniques. It also presents several critical challenges and promising directions in MU research. A curated list of readings can be found: https://github.com/franciscoliu/GenAI-MU-Reading.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (194)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. Git re-basin: Merging models modulo permutation symmetries. arXiv preprint arXiv:2209.04836 (2022).
  3. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. arXiv preprint arXiv:1905.13319 (2019).
  4. Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks. arXiv preprint arXiv:2404.02151 (2024).
  5. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 (2022).
  6. P Bedapudi. 2019. Nudenet: Neural nets for nudity classification, detection and selective censoring.
  7. On the dangers of stochastic parrots: Can language models be too big?. In FAccT.
  8. Piqa: Reasoning about physical commonsense in natural language. In AAAI.
  9. Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301 (2023).
  10. Digital Forgetting in Large Language Models: A Survey of Unlearning Methods. arXiv preprint arXiv:2404.02062 (2024).
  11. Nuanced metrics for measuring unintended bias with real data for text classification. In WWW.
  12. Machine unlearning. In IEEE Symposium on Security and Privacy (SP).
  13. Yinzhi Cao and Junfeng Yang. 2015. Towards making systems forget with machine unlearning. In IEEE symposium on security and privacy.
  14. Jiaao Chen and Diyi Yang. 2023. Unlearn what you want to forget: Efficient unlearning for llms. arXiv preprint arXiv:2310.20150 (2023).
  15. Jiali Cheng and Hadi Amiri. 2023. Multimodal Machine Unlearning. arXiv preprint arXiv:2311.12047 (2023).
  16. Can We Edit Multimodal Large Language Models? arXiv preprint arXiv:2310.08475 (2023).
  17. Efficient model updates for approximate unlearning of graph-structured data. In ICLR.
  18. Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models. In ICCV.
  19. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. In AAAI.
  20. Zero-shot machine unlearning. IEEE Transactions on Information Forensics and Security (2023).
  21. Evaluating the ripple effects of knowledge editing in language models. ACL (2024).
  22. Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges. arXiv preprint arXiv:2311.03287 (2023).
  23. Zheng Dai and David K Gifford. 2023. Training data attribution for diffusion models. arXiv preprint arXiv:2306.02174 (2023).
  24. Quang-Vinh Dang. 2021. Right to be forgotten in the age of machine learning. In Advances in Digital Science: ICADS 2021.
  25. Larimar: Large Language Models with Episodic Memory Control. arXiv preprint arXiv:2403.11901 (2024).
  26. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  27. Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination. arXiv preprint arXiv:2402.10052 (2024).
  28. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
  29. Avoiding Copyright Infringement via Machine Unlearning. arXiv preprint arXiv:2406.10952 (2024).
  30. Lifelong anomaly detection through unlearning. In CCS.
  31. Olivia G d’Aliberti and Mark A Clark. 2022. Preserving patient privacy during computation over shared electronic health record data. Journal of Medical Systems (2022).
  32. Ronen Eldan and Mark Russinovich. 2023. Who’s Harry Potter? Approximate Unlearning in LLMs. arXiv preprint arXiv:2310.02238 (2023).
  33. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. arXiv preprint arXiv:2310.12508 (2023).
  34. Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).
  35. Linear mode connectivity and the lottery ticket hypothesis. In ICML.
  36. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).
  37. Erasing concepts from diffusion models. In ICCV.
  38. Unified concept editing in diffusion models. In WACV.
  39. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858 (2022).
  40. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027 (2020).
  41. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. arXiv preprint arXiv:2203.14680 (2022).
  42. Making ai forget you: Data deletion in machine learning. Neurips (2019).
  43. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization.
  44. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In CVPR.
  45. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations. In ECCV.
  46. Generative adversarial nets. Neurips (2014).
  47. Model editing can hurt general abilities of large language models. arXiv preprint arXiv:2401.04700 (2024).
  48. Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models. arXiv preprint arXiv:2403.10557 (2024).
  49. Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast. arXiv preprint arXiv:2402.08567 (2024).
  50. Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030 (2019).
  51. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020).
  52. Alvin Heng and Harold Soh. 2024. Selective amnesia: A continual learning approach to forgetting in deep generative models. Neurips (2024).
  53. Denoising diffusion probabilistic models. Neurips (2020).
  54. The European Union general data protection regulation: what it is and what it means. Information & Communications Technology Law (2019).
  55. Parameter-efficient transfer learning for NLP. In ICML.
  56. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  57. Separate the wheat from the chaff: Model deficiency unlearning via parameter-efficient module operation. In AAAI.
  58. Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers. arXiv preprint arXiv:2311.17717 (2023).
  59. Catastrophic jailbreak of open-source llms via exploiting generation. arXiv preprint arXiv:2310.06987 (2023).
  60. Offset Unlearning for Large Language Models. arXiv preprint arXiv:2404.11045 (2024).
  61. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089 (2022).
  62. Patching open-vocabulary models by interpolating weights. Neurips (2022).
  63. Erin Illman and Paul Temple. 2019. California consumer privacy act. The Business Lawyer (2019).
  64. Masaru Isonuma and Ivan Titov. 2024. Unlearning Reveals the Influential Training Data of Language Models. arXiv preprint arXiv:2401.15241 (2024).
  65. Knowledge unlearning for mitigating privacy risks in language models. arXiv preprint arXiv:2210.01504 (2022).
  66. Beavertails: Towards improved safety alignment of llm via a human-preference dataset. Neurips (2024).
  67. Dataless knowledge fusion by merging weights of language models. arXiv preprint arXiv:2212.09849 (2022).
  68. Fairsisa: Ensemble post-processing to improve fairness of unlearning in llms. arXiv preprint arXiv:2312.07420 (2023).
  69. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
  70. Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models. In EMNLP.
  71. Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
  72. Overcoming catastrophic forgetting in neural networks. PNAS (2017).
  73. Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning.
  74. Privacy adhering machine un-learning in nlp. arXiv preprint arXiv:2212.09573 (2022).
  75. Ablating concepts in text-to-image diffusion models. In ICCV.
  76. Deduplicating training data makes language models better. arXiv preprint arXiv:2107.06499 (2021).
  77. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021).
  78. Controllable text-to-image generation. In Neurips.
  79. Machine Unlearning for Image-to-Image Generative Models. arXiv preprint arXiv:2402.00351 (2024).
  80. Halueval: A large-scale hallucination evaluation benchmark for large language models. In EMNLP.
  81. Pre-trained language models for text generation: A survey. Comput. Surveys (2024).
  82. The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning. arXiv preprint arXiv:2403.03218 (2024).
  83. Pmet: Precise model editing in a transformer. In AAAI.
  84. Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data. arXiv preprint arXiv:2307.00456 (2023).
  85. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355 (2023).
  86. Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning. arXiv preprint arXiv:2403.16257 (2024).
  87. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958 (2021).
  88. Microsoft coco: Common objects in context. In ECCV.
  89. Continual learning and private unlearning. In CoLLAs.
  90. Mitigating hallucination in large multi-modal models via robust instruction tuning. In ICLR.
  91. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744 (2023).
  92. Visual instruction tuning. Neurips (2024).
  93. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Neurips (2022).
  94. Rethinking Machine Unlearning for Large Language Models. arXiv preprint arXiv:2402.08787 (2024).
  95. Backdoor defense with machine unlearning. In IEEE INFOCOM.
  96. Towards Safer Large Language Models through Machine Unlearning. arXiv preprint arXiv:2402.10058 (2024).
  97. Breaking the trilemma of privacy, utility, efficiency via controllable machine unlearning. arXiv preprint arXiv:2310.18574 (2023).
  98. Zhe Liu and Ozlem Kalinli. 2024. Forgetting Private Textual Sequences in Language Models Via Leave-One-Out Ensemble. In ICASSP.
  99. Threats, attacks, and defenses in machine unlearning: A survey. arXiv preprint arXiv:2403.13682 (2024).
  100. Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge. arXiv preprint arXiv:2404.05880 (2024).
  101. Quark: Controllable text generation with reinforced unlearning. Neurips (2022).
  102. Li Lucy and David Bamman. 2021. Gender and representation bias in GPT-3 generated stories. In Proceedings of the third workshop on narrative understanding.
  103. Eight Methods to Evaluate Robust Unlearning in LLMs. arXiv preprint arXiv:2402.16835 (2024).
  104. Learning word vectors for sentiment analysis. In ACL.
  105. Tofu: A task of fictitious unlearning for llms. arXiv preprint arXiv:2401.06121 (2024).
  106. Hard to forget: Poisoning attacks on certified machine unlearning. In AAAI.
  107. Michael Matena and Colin Raffel. [n. d.]. Merging models with fisher-weighted averaging, 2021. arXiv preprint arXiv:2111.09832 ([n. d.]).
  108. Michael S Matena and Colin A Raffel. 2022. Merging models with fisher-weighted averaging. Neurips (2022).
  109. Hatexplain: A benchmark dataset for explainable hate speech detection. In AAAI.
  110. Locating and editing factual associations in GPT. Neurips (2022).
  111. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
  112. Memory-based model editing at scale. In ICML.
  113. Feature unlearning for pre-trained gans and vaes. In AAAI.
  114. StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020).
  115. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. arXiv preprint arXiv:2010.00133 (2020).
  116. Variational bayesian unlearning. Neurips (2020).
  117. A survey of machine unlearning. arXiv preprint arXiv:2209.02299 (2022).
  118. Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models. arXiv preprint arXiv:2311.08011 (2023).
  119. Jailbreaking attack against multimodal large language model. arXiv preprint arXiv:2402.02309 (2024).
  120. Editing implicit assumptions in text-to-image diffusion models. In ICCV.
  121. Training language models to follow instructions with human feedback. Neurips (2022).
  122. Unlearning graph classifiers with limited data resources. In WWW.
  123. Subhodip Panda and Prathosh AP. 2023. FAST: Feature Aware Similarity Thresholding for Weak Unlearning in Black-Box Generative Models. arXiv preprint arXiv:2312.14895 (2023).
  124. Stuart L Pardau. 2018. The california consumer privacy act: Towards a european-style privacy regime in the united states. J. Tech. L. & Pol’y (2018).
  125. In-context unlearning: Language models as few shot unlearners. arXiv preprint arXiv:2310.07579 (2023).
  126. Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251 (2022).
  127. Nicholas Pochinkov and Nandi Schoots. 2024. Dissecting Language Models: Machine Unlearning via Selective Pruning. arXiv preprint arXiv:2403.01267 (2024).
  128. Formerly Data Protection. 2018. General data protection regulation (GDPR). Intersoft Consulting, Accessed in October (2018).
  129. Fine-tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693 (2023).
  130. The Frontier of Data Erasure: Machine Unlearning for Large Language Models. arXiv preprint arXiv:2403.15779 (2024).
  131. Learning transferable visual models from natural language supervision. In ICML.
  132. Improving language understanding by generative pre-training. (2018).
  133. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020).
  134. Copyright Protection in Generative AI: A Technical Perspective. arXiv preprint arXiv:2402.02333 (2024).
  135. Object hallucination in image captioning. arXiv preprint arXiv:1809.02156 (2018).
  136. High-resolution image synthesis with latent diffusion models. In CVPR.
  137. Jeffrey Rosen. 2011. The right to be forgotten. Stan. L. Rev. Online (2011).
  138. JK Rowling. 1997. Harry Potter [book series]. London: Bloomsbury and Little, Brown (1997).
  139. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR.
  140. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv preprint arXiv:2402.07927 (2024).
  141. Addressing cognitive bias in medical language models. arXiv preprint arXiv:2402.08113 (2024).
  142. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In CVPR.
  143. Laion-5b: An open large-scale dataset for training next generation image-text models. Neurips (2022).
  144. Model evaluation for extreme risks. arXiv preprint arXiv:2305.15324 (2023).
  145. Continual learning with deep generative replay. Neurips (2017).
  146. Knowledge unlearning for llms: Tasks, methods, and challenges. arXiv preprint arXiv:2311.15766 (2023).
  147. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP.
  148. Generative adversarial networks unlearning. arXiv preprint arXiv:2308.09881 (2023).
  149. Aligning large multimodal models with factually augmented rlhf. arXiv preprint arXiv:2309.14525 (2023).
  150. Evaluating and mitigating discrimination in language model decisions. arXiv preprint arXiv:2312.03689 (2023).
  151. Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning. arXiv preprint arXiv:2402.04401 (2024).
  152. Guardrail Baselines for Unlearning in LLMs. arXiv preprint arXiv:2403.03329 (2024).
  153. Unrolling sgd: Understanding factors influencing machine unlearning. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P).
  154. Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks. arXiv preprint arXiv:2309.14054 (2023).
  155. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  156. LEDGAR: A large-scale multi-label corpus for text classification of legal provisions in contracts. In LREC.
  157. Attention is all you need. Neurips (2017).
  158. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR (2010).
  159. Biasasker: Measuring the bias in conversational ai system. In FSE Conference.
  160. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).
  161. Kga: A general machine unlearning framework based on knowledge gap alignment. arXiv preprint arXiv:2305.06535 (2023).
  162. Selective forgetting: Advancing machine unlearning techniques and evaluation in language models. arXiv preprint arXiv:2402.05813 (2024).
  163. Editing Conceptual Knowledge for Large Language Models. arXiv preprint arXiv:2403.06259 (2024).
  164. Large Scale Knowledge Washing. arXiv preprint arXiv:2405.16720 (2024).
  165. Machine unlearning of features and labels. arXiv preprint arXiv:2108.11577 (2021).
  166. Albert Webson and Ellie Pavlick. 2021. Do prompt-based models really understand the meaning of their prompts? arXiv preprint arXiv:2109.01247 (2021).
  167. Jailbroken: How does llm safety training fail? Neurips (2024).
  168. Mika Westerlund. 2019. The emergence of deepfake technology: A review. Technology innovation management review (2019).
  169. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML.
  170. Robust fine-tuning of zero-shot models. In CVPR.
  171. Erasediff: Erasing data influence in diffusion models. arXiv preprint arXiv:2401.05779 (2024).
  172. Depn: Detecting and editing privacy neurons in pretrained language models. arXiv preprint arXiv:2310.20138 (2023).
  173. Jailbreaking gpt-4v via self-adversarial attacks with system prompts. arXiv preprint arXiv:2311.09127 (2023).
  174. EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models. arXiv preprint arXiv:2402.09801 (2024).
  175. Machine Unlearning: A Survey. CSUR (2023).
  176. Yi Xu. 2024. Machine Unlearning for Traditional Models and Large Language Models: A Short Survey. arXiv preprint arXiv:2404.01206 (2024).
  177. Machine Unlearning of Pre-trained Large Language Models. arXiv preprint arXiv:2402.15159 (2024).
  178. Large language model unlearning. arXiv preprint arXiv:2310.10683 (2023).
  179. Unlearning bias in language models by partitioning gradients. In Findings of the ACL.
  180. Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback. arXiv preprint arXiv:2312.00849 (2023).
  181. Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830 (2019).
  182. Right to be forgotten in the era of large language models: Implications, challenges, and solutions. arXiv preprint arXiv:2307.03941 (2023).
  183. Forget-me-not: Learning to forget in text-to-image diffusion models. arXiv preprint arXiv:2303.17591 (2023).
  184. Composing Parameter-Efficient Modules with Arithmetic Operation. Neurips (2024).
  185. Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning. arXiv preprint arXiv:2404.05868 (2024).
  186. Personalizing dialogue agents: I have a dog, do you have pets too? arXiv preprint arXiv:1801.07243 (2018).
  187. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images… for now. arXiv preprint arXiv:2310.11868 (2023).
  188. UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models. arXiv preprint arXiv:2402.11846 (2024).
  189. Learning and forgetting unsafe examples in large language models. arXiv preprint arXiv:2312.12736 (2023).
  190. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
  191. Judging llm-as-a-judge with mt-bench and chatbot arena. Neurips (2024).
  192. Making harmful behaviors unlearnable for large language models. arXiv preprint arXiv:2311.02105 (2023).
  193. Towards language-free training for text-to-image generation. In CVPR.
  194. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. In ICCV.
Citations (8)

Summary

  • The paper introduces a novel problem formulation for unlearning specific data subsets in generative AI, evaluated using accuracy, locality, and generalizability metrics.
  • It details advanced methods including gradient-based optimization, knowledge distillation, and modular adjustments, alongside in-context unlearning techniques.
  • The survey highlights practical applications like bias reduction, privacy compliance, and safety alignment, while outlining future challenges and research directions.

Machine Unlearning in Generative AI: A Survey

The paper "Machine Unlearning in Generative AI: A Survey," authored by Zheyuan Liu et al., provides an in-depth examination of current techniques, challenges, and applications of Machine Unlearning (MU) specifically tailored for Generative AI models. This survey aims to encapsulate the current state of MU in generative models, adding clarity and direction to a budding area of research, with the objective of rendering AI models more reliable, safe, and compliant with privacy requisites.

Introduction and Motivation

The escalating adoption and deployment of generative AI technology—spanning LLMs, vision generative models, and Multimodal LLMs (MLLMs)—has magnified the risks associated with these models memorizing and propagating sensitive, biased, or harmful information inherited from their training data. Traditional machine unlearning techniques, designed primarily for classification tasks, fall short in addressing issues specific to generative models. This gap necessitates innovations in unlearning techniques that cater to the unique characteristics of generative models, such as the preservation of model performance while ensuring the safe and complete removal of specific learned information.

Problem Formulation and Objectives

The paper presents a new problem formulation for machine unlearning in generative AI, based on three sets of data points:

  1. Target Forget Set (Df~\tilde{D_f}) - Consists of specific instances within the training data that need to be unlearned.
  2. Retain Set (DrD_r) - Encompasses the remaining training data that must be preserved without degradation.
  3. Unseen Forget Set (Df^\hat{D_f}) - Represents data points that resemble the targeted forget set but were not part of the original training data.

The evaluation of unlearning is anchored around three primary metrics:

  • Accuracy: Ensures that the unlearned model does not generate outputs associated with the forget set.
  • Locality: Measures the preservation of the model's performance on the retain set.
  • Generalizability: Assesses the unlearned model's ability to generalize the unlearning to unseen data similar to the forget set.

Categorization of MU Techniques

MU strategies in Generative AI are broadly classified into two categories: Parameter Optimization and In-Context Unlearning.

Parameter Optimization

This category encompasses methods that adjust model parameters to selectively forget undesirable knowledge while retaining overall functionality. Key techniques include:

  • Gradient-Based Methods: Utilize reverse loss or standard gradient methods to optimize the model's parameters for unlearning. Works such as LLMU and NPO effectively demonstrate these methods, trading off between performance and computational costs.
  • Knowledge Distillation: Employs teacher-student frameworks to transfer desirable knowledge while omitting undesired information. Techniques like KGA and δ\delta learning exemplify this approach.
  • Data Sharding: Divides the training data into multiple shards, each representing a smaller subset, which can then be retrained as necessary.
  • Extra Learnable Layers: Integrates additional trainable layers within the model that can be fine-tuned for unlearning specific knowledge without affecting the original model parameters, as shown in EUL and Receler.
  • Task Vector Methods: Involve modifying task-specific model weight vectors to forget particular skills or data. SKU is a notable implementation.
  • Parameter Efficient Module Operations (PEMO): Apply localized adjustments within adapter modules to forget targeted knowledge, extending the benefits seen in task vector methods to a more modular framework.

In-Context Unlearning

In-Context Unlearning manipulates the context or environment in which the model operates to induce unlearning without altering the model parameters. Techniques such as ICUL use prompt adjustments during inference to achieve unlearning, targeting the problem predominantly through black-box interactions.

Datasets and Benchmarking

The survey provides an extensive list of datasets used for different unlearning objectives:

  • Safety Alignment - Includes datasets like Civil Comments and Anthropic Red Team for evaluating the generation of harmful content.
  • Privacy Compliance - Enlists datasets such as The Pile and IMDB to evaluate the model's compliance with privacy requisites.
  • Hallucination Reduction - Utilizes datasets like TruthfulQA and CounterFact to manage information accuracy.
  • Bias/Unfairness Alleviation - Involves datasets like StereoSet to address and rectify biases.

Applications and Implications

Machine unlearning in generative AI holds substantial promise in several practical applications:

  • Safety Alignment: Ensuring generated content is free from harmful biases and inappropriate knowledge.
  • Privacy Compliance: Complying with data protection regulations by enabling models to forget specific user data.
  • Copyright Protection: Safeguarding intellectual property by allowing models to unlearn content derived from copyrighted materials.
  • Hallucination Reduction: Reducing the incidence of inaccurate or hallucinated responses in generated content.
  • Bias/Unfairness Alleviation: Removing biases to enhance the fairness of generative models.

Challenges and Future Directions

Despite recent advancements, several challenges persist:

  1. Consistency of Unlearning Targets: Maintaining consistent unlearning outcomes amidst evolving knowledge bases.
  2. Robust Unlearning: Enhancing resilience against jailbreak and backdoor attacks.
  3. Knowledge Entanglement: Addressing interdependencies between different pieces of knowledge without compromising model performance.
  4. Theoretical Analysis: Bridging the gap between practical applications and theoretical guarantees.

Future work in this domain is expected to focus on refining these techniques, ensuring scalability, robustness, and consistency across various generative AI contexts.

Conclusion

This survey provides a comprehensive landscape of machine unlearning techniques in Generative AI, laying the groundwork for future research and developments in this essential domain. By addressing critical challenges and adapting detailed methodologies, the field is poised to make significant advancements toward creating safer, more reliable, and privacy-compliant generative models.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews