Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models (2305.10120v2)

Published 17 May 2023 in cs.LG and cs.AI

Abstract: The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models. Our method, dubbed Selective Amnesia, enables controllable forgetting where a user can specify how a concept should be forgotten. Selective Amnesia can be applied to conditional variational likelihood models, which encompass a variety of popular deep generative frameworks, including variational autoencoders and large-scale text-to-image diffusion models. Experiments across different models demonstrate that our approach induces forgetting on a variety of concepts, from entire classes in standard datasets to celebrity and nudity prompts in text-to-image models. Our code is publicly available at https://github.com/clear-nus/selective-amnesia.

Authors (2)

Alvin Heng (5 papers)
Harold Soh (54 papers)

Citations (68)

View on Semantic Scholar

Summary

The paper introduces Selective Amnesia, a method that adapts continual learning techniques for controlled forgetting in deep generative models.
It repurposes Elastic Weight Consolidation and generative replay to suppress specific concepts using a surrogate objective and negative log likelihood.
Experiments on models like Stable Diffusion and datasets such as MNIST validate its effectiveness, though concept leakage and computational costs remain challenges.

Selective Amnesia: A Continuous Learning Approach to Forgetting in Deep Generative Models

This paper addresses a pertinent concern in the field of large-scale generative models—their potential misuse in generating misleading or harmful content, such as deepfakes or inappropriate images. The authors propose a novel approach, termed "Selective Amnesia" (SA), which allows for the controlled forgetting of specific concepts in pretrained deep generative models, including widely-used frameworks like Variational Autoencoders (VAEs) and text-to-image diffusion models.

The paper begins by acknowledging the challenge in filtering harmful or inappropriate content from the training datasets of these models, a task that traditionally requires retraining from scratch, which is computationally expensive. To address this, the authors draw inspiration from continual learning, particularly focusing on techniques traditionally aimed at preventing forgetting, and adapt them to achieve controlled forgetting.

Selective Amnesia leverages a unified approach encompassing Elastic Weight Consolidation (EWC) and Generative Replay (GR). EWC, initially used to prevent forgetting by regularizing relevant model parameters based on their significance in prior tasks, is adapted to facilitate forgetting specific concepts. The proposed solution counteracts the tendency of models to retain information by using a negative log likelihood over the concepts to be forgotten, counterbalanced by a regularization term.

Moreover, an innovative component of their approach is the surrogate objective. Unlike methods that directly attempt to decrease the log likelihood over the data to forget—often ineffective for variational models—SA employs a surrogate distribution, $q(x|c_f)$ , to remap the forgotten concept. It allows flexibility in what the forgotten concept corresponds to, whether it be noise (uniform distribution) or any user-defined representation, adding a layer of control and applicability across diverse use cases.

The experimental results provide substantial evidence for the efficacy of SA across different datasets and generative models. The capability of SA to forget a range of concepts—such as eliminating the generation of specific digits or classes in image datasets like MNIST, CIFAR10, and STL10—is quantitatively validated using metrics like classifier entropy and probability of classification against the forgotten class, achieving near-maximum uncertainty (entropy).

In case studies involving complex models like Stable Diffusion, SA demonstrated significant potential, showing that it could modify model behavior to map forgotten celebrity prompts to generic descriptions (e.g., "middle aged man/woman") or even unrelated concepts like clowns. A comparative analysis against methods like Safe Latent Diffusion (SLD) and Erasing Stable Diffusion (ESD) demonstrated that while these methods sometimes failed to produce faces when removing specific celebrity identities, SA maintained the capability to generate semantically coherent and visually plausible human figures, albeit not the original forgotten identities.

However, while SA shows clear advantages in controlled forgetting, it presents certain limitations. The approach requires manual selection of an appropriate surrogate distribution, and its generalization to "global" concepts remains limited compared to its aptitude for precise "local" forgetting. Moreover, computational costs in terms of FIM calculation and sampling for FIM and replay are non-trivial, especially for large-scale models, suggesting future work could explore more efficient implementations. Further, SA exhibits "concept leakage," affecting other related concepts, which raises mixed implications regarding its reliability in dynamic actual-world scenarios involving legal or ethical compliance.

In conclusion, Selective Amnesia advances the discourse on how deep generative models can be regulated, adding crucial capabilities that allow researchers and practitioners to mitigate risks associated with content misuse while preserving legitimate uses. The methodology sets a promising precedent for enhancing the controllability of generative models and invites further exploration on systematically streamlining its computational demands and expanding its applicability.

Related Papers

GitHub

GitHub - clear-nus/selective-amnesia (54 stars)