Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
108 tokens/sec
GPT-4o
67 tokens/sec
Gemini 2.5 Pro Pro
54 tokens/sec
o3 Pro
13 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
24 tokens/sec
2000 character limit reached

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks (2306.13103v1)

Published 16 Jun 2023 in cs.CR, cs.AI, cs.CV, and cs.LG

Abstract: Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions. The real-world applications of these models require particular attention to their safety and fidelity, but this has not been sufficiently explored. One fundamental question is whether existing T2I DMs are robust against variations over input texts. To answer it, this work provides the first robustness evaluation of T2I DMs against real-world attacks. Unlike prior studies that focus on malicious attacks involving apocryphal alterations to the input texts, we consider an attack space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can make, to ensure semantic consistency. Given the inherent randomness of the generation process, we develop novel distribution-based attack objectives to mislead T2I DMs. We perform attacks in a black-box manner without any knowledge of the model. Extensive experiments demonstrate the effectiveness of our method for attacking popular T2I DMs and simultaneously reveal their non-trivial robustness issues. Moreover, we provide an in-depth analysis of our method to show that it is not designed to attack the text encoder in T2I DMs solely.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  2. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
  3. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  4. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  5. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  6. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10674–10685. IEEE, 2022.
  7. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  8. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  9. Video diffusion models. arXiv preprint arXiv:2204.03458, 2022.
  10. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  11. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  12. Vector quantized diffusion model for text-to-image synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10686–10696. IEEE, 2022.
  13. Evading deepfake-image detectors with white-and black-box attacks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 658–659, 2020.
  14. A pathway towards responsible ai generated content. arXiv preprint arXiv:2303.01325, 2023.
  15. Raphaël Millière. Adversarial attacks on image generation with made-up words. arXiv preprint arXiv:2208.04135, 2022.
  16. Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 2023.
  17. A pilot study of query-free adversarial attack against stable diffusion. arXiv preprint arXiv:2303.16378, 2023.
  18. Textbugger: Generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271, 2018.
  19. From hero to zéroe: A benchmark of low-level adversarial attacks. In Proc. of AACL, 2020.
  20. Text processing like humans do: Visually attacking and shielding nlp systems. In Proc. of NAACL, 2019.
  21. Perturbations in the wild: Leveraging human-written text perturbations for realistic adversarial attack and defense. arXiv preprint arXiv:2203.10346, 2022.
  22. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  23. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems, 34:19822–19835, 2021.
  24. Make-a-scene: Scene-based text-to-image generation with human priors. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, pages 89–106. Springer, 2022.
  25. What the daam: Interpreting stable diffusion using cross attention. arXiv preprint arXiv:2210.04885, 2022.
  26. Pretraining is all you need for image-to-image translation. In arXiv, 2022.
  27. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  28. Sketch-guided text-to-image diffusion models. arXiv preprint arXiv:2211.13752, 2022.
  29. Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
  30. Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–41, 2020.
  31. Investigating top-k white-box and transferable black-box attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15085–15094, 2022.
  32. A geometry-inspired attack for generating natural language adversarial examples. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6679–6689, 2020.
  33. Exploring the universal vulnerability of prompt-based learning paradigm. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1799–1810, Seattle, United States, July 2022. Association for Computational Linguistics.
  34. Towards efficient data free black-box adversarial attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15115–15125, June 2022.
  35. Model extraction and adversarial transferability, your bert is vulnerable! In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2006–2012, 2021.
  36. Using punctuation as an adversarial attack on deep learning-based nlp systems: An empirical study. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1–34, 2023.
  37. Text processing like humans do: Visually attacking and shielding nlp systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1634–1647, 2019.
  38. Query-efficient and scalable black-box adversarial attacks on discrete sequential data via bayesian optimization. In International Conference on Machine Learning, pages 12478–12497. PMLR, 2022.
  39. Contextualized perturbation for textual adversarial attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5053–5069, 2021.
  40. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018.
  41. T3: Tree-autoencoder constrained adversarial text generation for targeted attack. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6134–6150, 2020.
  42. Adversarial attack and defense of structured prediction models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2327–2338, 2020.
  43. Synthesizing adversarial negative responses for robust response ranking and evaluation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3867–3883, 2021.
  44. Universal adversarial triggers for attacking and analyzing nlp. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, 2019.
  45. Minimax estimation of maximum mean discrepancy with radial kernels. Advances in Neural Information Processing Systems, 29, 2016.
  46. Training generative neural networks via maximum mean discrepancy optimization. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pages 258–267, 2015.
  47. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  48. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  49. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  50. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  51. Why should adversarial perturbations be imperceptible? rethink the research paradigm in adversarial NLP. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 11222–11237. Association for Computational Linguistics, 2022.
  52. From adversarial arms race to model-centric evaluation: Motivating a unified automatic robustness evaluation framework. arXiv preprint arXiv:2305.18503, 2023.
  53. Pathologies of neural models make interpretations difficult. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3719–3728, 2018.
  54. Combating adversarial misspellings with robust word recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5582–5591, 2019.
  55. Universal adversarial attacks on text classifiers. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7345–7349. IEEE, 2019.
  56. Im2text: Describing images using 1 million captioned photographs. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
  57. Clipscore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, 2021.
  58. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14896 [cs], 2022.
  59. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  60. Laion-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  61. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
  62. BERT-ATTACK: adversarial attack against BERT using BERT. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 6193–6202. Association for Computational Linguistics, 2020.
  63. Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1085–1097, Florence, Italy, July 2019. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hongcheng Gao (28 papers)
  2. Hao Zhang (945 papers)
  3. Yinpeng Dong (102 papers)
  4. Zhijie Deng (58 papers)
Citations (16)