Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MMA-Diffusion: MultiModal Attack on Diffusion Models (2311.17516v4)

Published 29 Nov 2023 in cs.CR and cs.CV

Abstract: In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Towards Evaluating the Robustness of Neural Networks. In Proceedings of the IEEE Symposium on Security and Privacy, pp.  39–57, 2017.
  2. Erasing Concepts from Diffusion Models. arXiv preprint arXiv:2303.07345, 2023.
  3. Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks. arXiv preprint arXiv:2306.13103, 2023.
  4. BAE: BERT-based Adversarial Examples for Text Classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.  6174–6181, 2020.
  5. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations, 2015.
  6. TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization. In Proceedings of the International Conference on Learning Representations, 2023.
  7. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  8018–8025, 2020.
  8. Segment Anything. arXiv preprint arXiv:2304.02643, 2023.
  9. Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models. In Proceedings of the International Joint Conference on Artificial Intelligence, pp.  983–990, 2023.
  10. Ablating Concepts in Text-to-Image Diffusion Models. arXiv preprint arXiv:2303.13516, 2023.
  11. Holistic Evaluation of Text-To-Image Models. arXiv preprint arXiv:2311.04287, 2023.
  12. Leonardo.Ai. Leonardo.ai, access date: 9st nov. 2023. https://leonardo.ai/, 2023.
  13. BERT-ATTACK: Adversarial Attack Against BERT Using BERT. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.  6193–6202, 2020.
  14. RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation with Natural Prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  20585–20594, 2023a.
  15. Intriguing Properties of Text-guided Diffusion Models. arXiv preprint arXiv:2306.00974, 2023b.
  16. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the International Conference on Learning Representations, 2018.
  17. Midjourney. Midjourney, access date: 26th sept. 2023. https://midjourney.com/, 2023.
  18. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In Proceedings of the International Conference on Machine Learning, pp.  16784–16804, 2022.
  19. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  20. Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models. arXiv preprint arXiv:2305.13873, 2023.
  21. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125, 2022.
  22. Red-Teaming the Stable Diffusion Safety Filter. arXiv preprint arXiv:2210.04610, 2022a.
  23. Red-Teaming the Stable Diffusion Safety Filter. arXiv preprint arXiv:2210.04610, 2022b.
  24. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10674–10685, 2022.
  25. Safety-checker. Safety checker nested in stable diffusion. https://huggingface.co/CompVis/stable-diffusion-safety-checker, 2023.
  26. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems, 2022.
  27. Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content? In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022, pp.  1350–1361, 2022.
  28. Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  22522–22531, 2023.
  29. Laion-coco. https://laion.ai/blog/laion-coco/, 2022.
  30. SDv1.5. Stable diffusion v1.5 checkpoint. https://huggingface.co/runwayml/stable-diffusion-v1-5?text=chi+venezuela+drogenius, 2023.
  31. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.  4222–4235, 2020.
  32. Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models? arXiv preprint arXiv:2310.10012, 2023.
  33. Adversarial Training with Fast Gradient Projection Method against Synonym Substitution Based Text Attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  13997–14005, 2021.
  34. On the Robustness of Latent Diffusion Models. arXiv preprint arXiv:2306.08257, 2023a.
  35. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images… for now. arXiv preprint arXiv:2310.11868, 2023b.
  36. A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Workshops, Vancouver, BC, Canada, June 17-24, 2023, pp.  2385–2392, 2023.
  37. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv preprint arXiv:2307.15043, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yijun Yang (46 papers)
  2. Ruiyuan Gao (18 papers)
  3. Xiaosen Wang (30 papers)
  4. Tsung-Yi Ho (57 papers)
  5. Nan Xu (83 papers)
  6. Qiang Xu (129 papers)
Citations (30)
X Twitter Logo Streamline Icon: https://streamlinehq.com