Espresso: Robust Concept Filtering in Text-to-Image Models (2404.19227v6)

Published 30 Apr 2024 in cs.CV and cs.CR

Abstract: Diffusion based text-to-image models are trained on large datasets scraped from the Internet, potentially containing unacceptable concepts (e.g., copyright-infringing or unsafe). We need concept removal techniques (CRTs) which are i) effective in preventing the generation of images with unacceptable concepts, ii) utility-preserving on acceptable concepts, and, iii) robust against evasion with adversarial prompts. No prior CRT satisfies all these requirements simultaneously. We introduce Espresso, the first robust concept filter based on Contrastive Language-Image Pre-Training (CLIP). We identify unacceptable concepts by using the distance between the embedding of a generated image to the text embeddings of both unacceptable and acceptable concepts. This lets us fine-tune for robustness by separating the text embeddings of unacceptable and acceptable concepts while preserving utility. We present a pipeline to evaluate various CRTs to show that Espresso is more effective and robust than prior CRTs, while retaining utility.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Anudeep Das (3 papers)
Vasisht Duddu (21 papers)
Rui Zhang (1138 papers)
N. Asokan (78 papers)

Citations (3)

View on Semantic Scholar

Tweets

https://twitter.com/FSFG/status/1788685714374144337

https://twitter.com/FSFG/status/1786496161316065291

https://twitter.com/FSFG/status/1785606297510912169

Espresso: Robust Concept Filtering in Text-to-Image Models (2404.19227v6)

Related Papers

Tweets