MACE: Mass Concept Erasure in Diffusion Models (2403.06135v1)

Published 10 Mar 2024 in cs.CV, cs.AI, and cs.LG

Abstract: The rapid expansion of large-scale text-to-image diffusion models has raised growing concerns regarding their potential misuse in creating harmful or misleading content. In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure. This task aims to prevent models from generating images that embody unwanted concepts when prompted. Existing concept erasure methods are typically restricted to handling fewer than five concepts simultaneously and struggle to find a balance between erasing concept synonyms (generality) and maintaining unrelated concepts (specificity). In contrast, MACE differs by successfully scaling the erasure scope up to 100 concepts and by achieving an effective balance between generality and specificity. This is achieved by leveraging closed-form cross-attention refinement along with LoRA finetuning, collectively eliminating the information of undesirable concepts. Furthermore, MACE integrates multiple LoRAs without mutual interference. We conduct extensive evaluations of MACE against prior methods across four different tasks: object erasure, celebrity erasure, explicit content erasure, and artistic style erasure. Our results reveal that MACE surpasses prior methods in all evaluated tasks. Code is available at https://github.com/Shilin-LU/MACE.

Authors (5)

Shilin Lu (10 papers)
Zilan Wang (3 papers)
Leyang Li (3 papers)
Yanzhu Liu (2 papers)
Adams Wai-Kin Kong (19 papers)

Citations (33)

View on Semantic Scholar

Summary

The paper presents the MACE framework that scales concept erasure to 100 targets, enhancing safety in text-to-image models.
MACE balances generality and specificity by integrating cross-attention refinement with LoRA finetuning to avoid over-correction.
Evaluations across object, celebrity, explicit content, and style tasks demonstrate MACE's superiority over state-of-the-art methods.

MACE: Mass Concept Erasure in Diffusion Models

The research paper titled "MACE: Mass Concept Erasure in Diffusion Models" introduces a novel finetuning framework aimed at addressing the challenges associated with text-to-image (T2I) diffusion models, particularly their potential misuse in generating harmful or inappropriate content. The MACE framework, an acronym for MAss Concept Erasure, is designed to efficiently erase undesirable concepts from such models while maintaining their functionality in generating unrelated content.

Key Contributions

Efficacy in Concept Erasure: The MACE framework distinguishes itself by significantly expanding the scope of concept erasure. Unlike existing methods that typically handle fewer than five concepts at a time, MACE successfully scales to erase up to 100 concepts. This is particularly beneficial for safeguarding issues such as celebrity likenesses, copyrighted materials, and the prevention of explicit content generation.
Generality and Specificity Balance: MACE achieves a nuanced balance between generality and specificity. Generality ensures the removal of all expressions and synonyms related to a concept, while specificity protects unrelated concepts, even when they share common terms with the targeted concepts.
Technical Approach: To accomplish this balance, MACE employs cross-attention refinement and LoRA (Low-Rank Adaptation) finetuning. This combination effectively suppresses the residual information of the target concept embedded within co-existing words, ensuring the concept's comprehensive erasure.
Closed-Form Solutions: The paper introduces closed-form solutions for both refining the model's cross-attention layers and the integration of multiple LoRA modules. This approach facilitates efficient and effective concept erasure without interfering with the model's retained capabilities.
Evaluation and Results: Extensive evaluations across four tasks—object, celebrity, explicit content, and artistic style erasure—demonstrate that MACE surpasses existing state-of-the-art methods in all evaluated dimensions. The framework effectively strikes a balance between the efficacy of erasure and preservation of the creative capabilities of the model, as evidenced by strong numerical results.

Implications and Future Work

The implications of the MACE framework are both practical and theoretical. Practically, it offers a robust solution for content moderation in AI models, thereby paving the way for safer applications of T2I models. Theoretically, it opens avenues for future research into large-scale concept erasure, particularly as models continue to grow in size and complexity. Further exploration could focus on enhancing MACE's scalability, improving its adaptability to more advanced AI models, and refining its application to diverse datasets beyond the current four tasks.

In summary, MACE represents a significant advancement in the field of AI content generation, offering a finely-tuned mechanism for controlling unwanted content generation while preserving the integrity and creativity of T2I models.

PDF Markdown

Related Papers

GitHub

GitHub - Shilin-LU/MACE: [CVPR 2024] "MACE: Mass Concept Erasure in Diffusion Models" (Official Implementation) (395 stars)

Tweets

https://twitter.com/ducha_aiki/status/1767490053159137785