MMA-Diffusion: MultiModal Attack on Diffusion Models (2311.17516v4)
Abstract: In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.
- Towards Evaluating the Robustness of Neural Networks. In Proceedings of the IEEE Symposium on Security and Privacy, pp. 39–57, 2017.
- Erasing Concepts from Diffusion Models. arXiv preprint arXiv:2303.07345, 2023.
- Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks. arXiv preprint arXiv:2306.13103, 2023.
- BAE: BERT-based Adversarial Examples for Text Classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 6174–6181, 2020.
- Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations, 2015.
- TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization. In Proceedings of the International Conference on Learning Representations, 2023.
- Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8018–8025, 2020.
- Segment Anything. arXiv preprint arXiv:2304.02643, 2023.
- Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 983–990, 2023.
- Ablating Concepts in Text-to-Image Diffusion Models. arXiv preprint arXiv:2303.13516, 2023.
- Holistic Evaluation of Text-To-Image Models. arXiv preprint arXiv:2311.04287, 2023.
- Leonardo.Ai. Leonardo.ai, access date: 9st nov. 2023. https://leonardo.ai/, 2023.
- BERT-ATTACK: Adversarial Attack Against BERT Using BERT. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 6193–6202, 2020.
- RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation with Natural Prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20585–20594, 2023a.
- Intriguing Properties of Text-guided Diffusion Models. arXiv preprint arXiv:2306.00974, 2023b.
- Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the International Conference on Learning Representations, 2018.
- Midjourney. Midjourney, access date: 26th sept. 2023. https://midjourney.com/, 2023.
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In Proceedings of the International Conference on Machine Learning, pp. 16784–16804, 2022.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models. arXiv preprint arXiv:2305.13873, 2023.
- Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125, 2022.
- Red-Teaming the Stable Diffusion Safety Filter. arXiv preprint arXiv:2210.04610, 2022a.
- Red-Teaming the Stable Diffusion Safety Filter. arXiv preprint arXiv:2210.04610, 2022b.
- High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10674–10685, 2022.
- Safety-checker. Safety checker nested in stable diffusion. https://huggingface.co/CompVis/stable-diffusion-safety-checker, 2023.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems, 2022.
- Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content? In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022, pp. 1350–1361, 2022.
- Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22522–22531, 2023.
- Laion-coco. https://laion.ai/blog/laion-coco/, 2022.
- SDv1.5. Stable diffusion v1.5 checkpoint. https://huggingface.co/runwayml/stable-diffusion-v1-5?text=chi+venezuela+drogenius, 2023.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 4222–4235, 2020.
- Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models? arXiv preprint arXiv:2310.10012, 2023.
- Adversarial Training with Fast Gradient Projection Method against Synonym Substitution Based Text Attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13997–14005, 2021.
- On the Robustness of Latent Diffusion Models. arXiv preprint arXiv:2306.08257, 2023a.
- To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images… for now. arXiv preprint arXiv:2310.11868, 2023b.
- A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Workshops, Vancouver, BC, Canada, June 17-24, 2023, pp. 2385–2392, 2023.
- Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv preprint arXiv:2307.15043, 2023.
- Yijun Yang (46 papers)
- Ruiyuan Gao (18 papers)
- Xiaosen Wang (30 papers)
- Tsung-Yi Ho (57 papers)
- Nan Xu (83 papers)
- Qiang Xu (129 papers)