Papers
Topics
Authors
Recent
2000 character limit reached

On the Vulnerability of Concept Erasure in Diffusion Models

Published 24 Feb 2025 in cs.LG, cs.AI, and cs.CR | (2502.17537v2)

Abstract: The proliferation of text-to-image diffusion models has raised significant privacy and security concerns, particularly regarding the generation of copyrighted or harmful images. In response, several concept erasure (defense) methods have been developed to prevent the generation of unwanted content through post-hoc finetuning. On the other hand, concept restoration (attack) methods seek to recover supposedly erased concepts via adversarially crafted prompts. However, all existing restoration methods only succeed in the highly restrictive scenario of finding adversarial prompts tailed to some fixed seed. To address this, we introduce RECORD, a novel coordinate-descent-based restoration algorithm that finds adversarial prompts to recover erased concepts independently of the seed. Our extensive experiments demonstrate RECORD consistently outperforms the current restoration methods by up to 17.8 times in this setting. Our findings further reveal the susceptibility of unlearned models to restoration attacks, providing crucial insights into the behavior of unlearned models under the influence of adversarial prompts.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

GitHub

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.