Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model (2405.16341v2)

Published 25 May 2024 in cs.CV

Abstract: In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces challenges with the potential misuse of reproducing sensitive content. To address this critical issue, we introduce \textbf{R}obust \textbf{A}dversarial \textbf{C}oncept \textbf{E}rase (RACE), a novel approach designed to mitigate these risks by enhancing the robustness of concept erasure method for T2I models. RACE utilizes a sophisticated adversarial training framework to identify and mitigate adversarial text embeddings, significantly reducing the Attack Success Rate (ASR). Impressively, RACE achieves a 30 percentage point reduction in ASR for the ``nudity'' concept against the leading white-box attack method. Our extensive evaluations demonstrate RACE's effectiveness in defending against both white-box and black-box attacks, marking a significant advancement in protecting T2I diffusion models from generating inappropriate or misleading imagery. This work underlines the essential need for proactive defense measures in adapting to the rapidly advancing field of adversarial challenges. Our code is publicly available: \url{https://github.com/chkimmmmm/R.A.C.E.}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Praneeth Bedapudi. Nudenet: Neural nets for nudity classification, detection and selective censoring. 12 2019.
  2. Towards evaluating the robustness of neural networks. 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, 2016. URL https://api.semanticscholar.org/CorpusID:2893830.
  3. Extracting training data from diffusion models. ArXiv, abs/2301.13188, 2023. URL https://api.semanticscholar.org/CorpusID:256389993.
  4. Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. arXiv preprint arXiv:2309.06135, 2023.
  5. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  6. Fake trump arrest photos: How to spot an ai-generated image. BBC News, March 2023. URL https://www.bbc.com/news/technology-68981525.
  7. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  8. The stable signature: Rooting watermarks in latent diffusion models. arXiv preprint arXiv:2303.15435, 2023.
  9. An image is worth one word: Personalizing text-to-image generation using textual inversion. ArXiv, abs/2208.01618, 2022. URL https://api.semanticscholar.org/CorpusID:251253049.
  10. Erasing concepts from diffusion models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2426–2436, 2023. URL https://api.semanticscholar.org/CorpusID:257495777.
  11. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014. URL https://api.semanticscholar.org/CorpusID:6706414.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  13. Selective amnesia: A continual learning approach to forgetting in deep generative models. ArXiv, abs/2305.10120, 2023. URL https://api.semanticscholar.org/CorpusID:258740988.
  14. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718, 2021.
  15. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
  16. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  17. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  18. All but one: Surgical concept erasing with model preservation in text-to-image diffusion models. ArXiv, abs/2312.12807, 2023. URL https://api.semanticscholar.org/CorpusID:266374816.
  19. fastai: A layered api for deep learning. Inf., 11:108, 2020. URL https://api.semanticscholar.org/CorpusID:211082837.
  20. Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers. ArXiv, abs/2311.17717, 2023. URL https://api.semanticscholar.org/CorpusID:265498506.
  21. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  22. Decentralized attribution of generative models. In International Conference on Learning Representations, 2021.
  23. Wouaf: Weight modulation for user attribution and fingerprinting in text-to-image diffusion models. arXiv preprint arXiv:2306.04744, 2023a.
  24. Towards safe self-distillation of internet-scale text-to-image diffusion models. ArXiv, abs/2307.05977, 2023b. URL https://api.semanticscholar.org/CorpusID:259837117.
  25. Analysis and extensions of adversarial training for video classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3416–3425, 2022.
  26. Adam: A method for stochastic optimization, 2014. URL http://arxiv.org/abs/1412.6980. cite arxiv:1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015.
  27. Multi-concept customization of text-to-image diffusion. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1931–1941, 2022. URL https://api.semanticscholar.org/CorpusID:254408780.
  28. Ablating concepts in text-to-image diffusion models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 22634–22645, 2023. URL https://api.semanticscholar.org/CorpusID:257687839.
  29. Your diffusion model is secretly a zero-shot classifier. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2206–2217, October 2023.
  30. Get what you want, not what you don’t: Image content suppression for text-to-image diffusion models. ArXiv, abs/2402.05375, 2024. URL https://api.semanticscholar.org/CorpusID:267547985.
  31. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  32. One-dimensional adapter to rule them all: Concepts, diffusion models and erasing applications. ArXiv, abs/2312.16145, 2023. URL https://api.semanticscholar.org/CorpusID:266551849.
  33. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  34. Philip Marcelo. Fact focus: Fake image of pentagon explosion briefly sends jitters through stock market. Associated Press, May 2023.
  35. Ores: Open-vocabulary responsible visual synthesis. ArXiv, abs/2308.13785, 2023. URL https://api.semanticscholar.org/CorpusID:261243073.
  36. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  37. Attributing image generative models using latent fingerprints. arXiv preprint arXiv:2304.09752, 2023.
  38. OpenAI. Chatgpt. Online, 2022. URL https://chat.openai.com/chat. Accessed on February 24, 2024.
  39. Eclipse: A resource-efficient text-to-image prior for image generations. ArXiv, abs/2312.04655, 2023. URL https://api.semanticscholar.org/CorpusID:266149498.
  40. λ𝜆\lambdaitalic_λ-eclipse: Multi-concept personalized text-to-image diffusion models by leveraging clip latent space. ArXiv, abs/2402.05195, 2024. URL https://api.semanticscholar.org/CorpusID:267547418.
  41. Circumventing concept erasure methods for text-to-image generative models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ag3o2T51Ht.
  42. Robust adversarial reinforcement learning. In International Conference on Machine Learning, pages 2817–2826. PMLR, 2017.
  43. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  44. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  45. Red-teaming the stable diffusion safety filter. arXiv preprint arXiv:2210.04610, 2022.
  46. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
  47. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  48. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, 2022. URL https://api.semanticscholar.org/CorpusID:251800180.
  49. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  50. Babak Saleh and A. Elgammal. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. ArXiv, abs/1505.00855, 2015. URL https://api.semanticscholar.org/CorpusID:14168099.
  51. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22522–22531, 2022. URL https://api.semanticscholar.org/CorpusID:253420366.
  52. Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
  53. Diffusion art or digital forgery? investigating data replication in diffusion models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6048–6058, 2022. URL https://api.semanticscholar.org/CorpusID:254366634.
  54. Understanding and mitigating copying in diffusion models. ArXiv, abs/2305.20086, 2023. URL https://api.semanticscholar.org/CorpusID:258987384.
  55. Ring-a-bell! how reliable are concept removal methods for diffusion models? ArXiv, abs/2310.10012, 2023. URL https://api.semanticscholar.org/CorpusID:264146485.
  56. Robustness may be at odds with accuracy. arXiv: Machine Learning, 2018. URL https://api.semanticscholar.org/CorpusID:52962648.
  57. Feature importance-aware transferable adversarial attacks. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 7619–7628, 2021. URL https://api.semanticscholar.org/CorpusID:236493523.
  58. Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15897–15907, 2023. URL https://api.semanticscholar.org/CorpusID:257219968.
  59. Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust. arXiv preprint arXiv:2305.20030, 2023.
  60. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Advances in Neural Information Processing Systems, 36, 2024.
  61. Visual transformers: Token-based image representation and processing for computer vision. ArXiv, abs/2006.03677, 2020. URL https://api.semanticscholar.org/CorpusID:219531480.
  62. Adversarial policy training against deep reinforcement learning. In 30th USENIX Security Symposium (USENIX Security 21), pages 1883–1900, 2021.
  63. Backdooring textual inversion for concept censorship. ArXiv, abs/2308.10718, 2023. URL https://api.semanticscholar.org/CorpusID:261049298.
  64. Mma-diffusion: Multimodal attack on diffusion models. ArXiv, abs/2311.17516, 2023. URL https://api.semanticscholar.org/CorpusID:265498727.
  65. Towards improving adversarial training of nlp models. arXiv preprint arXiv:2109.00544, 2021.
  66. Responsible disclosure of generative models using scalable fingerprinting. arXiv preprint arXiv:2012.08726, 2020.
  67. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In Proceedings of the IEEE/CVF International conference on computer vision, pages 14448–14457, 2021.
  68. Meta gradient adversarial attack. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 7728–7737, 2021. URL https://api.semanticscholar.org/CorpusID:236956844.
  69. Forget-me-not: Learning to forget in text-to-image diffusion models. ArXiv, abs/2303.17591, 2023a. URL https://api.semanticscholar.org/CorpusID:257833863.
  70. Theoretically principled trade-off between robustness and accuracy. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7472–7482. PMLR, 09–15 Jun 2019a. URL https://proceedings.mlr.press/v97/zhang19p.html.
  71. Robust invisible video watermarking with attention. 2019b.
  72. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images … for now. ArXiv, abs/2310.11868, 2023b. URL https://api.semanticscholar.org/CorpusID:264289091.
  73. Imma: Immunizing text-to-image models against malicious adaptation. ArXiv, abs/2311.18815, 2023. URL https://api.semanticscholar.org/CorpusID:265506125.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Changhoon Kim (19 papers)
  2. Kyle Min (22 papers)
  3. Yezhou Yang (119 papers)
Citations (9)