Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search
Abstract: Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the failure modes of TDMs in more detail. To achieve this, we propose SAGE, the first adversarial search method on TDMs that systematically explores the discrete prompt space and the high-dimensional latent space, to automatically discover undesirable behaviors and failure cases in image generation. We use image classifiers as surrogate loss functions during searching, and employ human inspections to validate the identified failures. For the first time, our method enables efficient exploration of both the discrete and intricate human language space and the challenging latent space, overcoming the gradient vanishing problem. Then, we demonstrate the effectiveness of SAGE on five widely used generative models and reveal four typical failure modes: (1) We find a variety of natural text prompts that generate images failing to capture the semantics of input texts. We further discuss the underlying causes and potential solutions based on the results. (2) We find regions in the latent space that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured. (3) We also find latent samples that result in natural-looking images unrelated to the text prompt, implying a possible misalignment between the latent and prompt spaces. (4) By appending a single adversarial token embedding to any input prompts, we can generate a variety of specified target objects. Project page: https://sage-diffusion.github.io/
- Multidiffusion: Fusing diffusion paths for controlled image generation. 2023.
- John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, pp. 679–698, 1986.
- Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. ACM Transactions on Graphics (TOG), 42(4):1–10, 2023.
- A comprehensive benchmark of human-like relational reasoning for text-to-image foundation models. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
- Discovering the hidden vocabulary of dalle-2. arXiv preprint arXiv:2206.00169, 2022.
- Diffusion models beat gans on image synthesis. arXiv preprint arXiv:2105.05233, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. In International Conference on Machine Learning, pp. 8489–8510. PMLR, 2023.
- Diffusion self-guidance for controllable image generation. arXiv preprint arXiv:2306.00986, 2023.
- Training-free structured diffusion guidance for compositional text-to-image synthesis. arXiv preprint arXiv:2212.05032, 2022.
- Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. International Conference on Learning Representations, 2019.
- Benchmarking spatial relationships in text-to-image generation. arXiv preprint arXiv:2212.10015, 2022.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Noise2music: Text-conditioned music generation with diffusion models. arXiv preprint arXiv:2302.03917, 2023.
- On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021.
- Deepfloyd. https://github.com/deep-floyd/IF, 2023.
- Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1931–1941, 2023.
- Dall-e 2 fails to reliably capture common syntactic processes. arXiv preprint arXiv:2210.12889, 2022.
- Compositional visual generation with composable diffusion models. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pp. 423–439. Springer, 2022.
- Poseexaminer: Automated testing of out-of-distribution robustness in human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 672–681, 2023.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Enhance the visual representation via discrete adversarial training. Advances in Neural Information Processing Systems, 35:7520–7533, 2022.
- A very preliminary analysis of dall-e 2. arXiv preprint arXiv:2204.13807, 2022.
- Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 2023.
- Raphaël Millière. Adversarial attacks on image generation with made-up words. arXiv preprint arXiv:2208.04135, 2022.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 conference proceedings, pp. 1–10, 2022.
- An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, 2022.
- Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. arXiv preprint arXiv:2302.03668, 2023.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Detectron2. https://github.com/facebookresearch/detectron2, 2019.
- Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796, 2022.
- A pilot study of query-free adversarial attack against stable diffusion. arXiv preprint arXiv:2303.16378, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.