Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators (2402.15218v1)

Published 23 Feb 2024 in cs.CR, cs.CL, and cs.CV

Abstract: Extremely large image generators offer significant transformative potential across diverse sectors. It allows users to design specific prompts to generate realistic images through some black-box APIs. However, some studies reveal that image generators are notably susceptible to attacks and generate Not Suitable For Work (NSFW) contents by manually designed toxin texts, especially imperceptible to human observers. We urgently need a multitude of universal and transferable prompts to improve the safety of image generators, especially black-box-released APIs. Nevertheless, they are constrained by labor-intensive design processes and heavily reliant on the quality of the given instructions. To achieve this, we introduce a black-box stealthy prompt attack (BSPA) that adopts a retriever to simulate attacks from API users. It can effectively harness filter scores to tune the retrieval space of sensitive words for matching the input prompts, thereby crafting stealthy prompts tailored for image generators. Significantly, this approach is model-agnostic and requires no internal access to the model's features, ensuring its applicability to a wide range of image generators. Building on BSPA, we have constructed an automated prompt tool and a comprehensive prompt attack dataset (NSFWeval). Extensive experiments demonstrate that BSPA effectively explores the security vulnerabilities in a variety of state-of-the-art available black-box models, including Stable Diffusion XL, Midjourney, and DALL-E 2/3. Furthermore, we develop a resilient text filter and offer targeted recommendations to ensure the security of image generators against prompt attacks in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Square attack: a query-efficient black-box adversarial attack via random search. In European conference on computer vision, pages 484–501. Springer.
  2. Improving black-box adversarial attacks with a transfer-based prior. Advances in neural information processing systems, 32.
  3. Introduction to derivative-free optimization. SIAM.
  4. Dall·e mini.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  6. Black-box prompt learning for pre-trained language models. Transactions on Machine Learning Research.
  7. Laura Hanu and Unitary team. 2020. Detoxify. Github. https://github.com/unitaryai/detoxify.
  8. Black-box adversarial attacks with limited queries and information. In International conference on machine learning, pages 2137–2146. PMLR.
  9. Dall-e-bot: Introducing web-scale diffusion models to robotics. IEEE Robotics and Automation Letters.
  10. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  11. Towards safe self-distillation of internet-scale text-to-image diffusion models. arXiv preprint arXiv:2307.05977.
  12. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer.
  13. Black-box few-shot knowledge distillation. In European Conference on Computer Vision, pages 196–211. Springer.
  14. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. ACM SIGSAC Conference on Computer and Communications Security.
  15. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR.
  16. Red-teaming the stable diffusion safety filter. arXiv preprint arXiv:2210.04610.
  17. Luis Miguel Rios and Nikolaos V Sahinidis. 2013. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization, 56:1247–1293.
  18. Beyond the ml model: Applying safety engineering frameworks to text-to-image development. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 70–83.
  19. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  20. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
  21. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494.
  22. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22522–22531.
  23. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294.
  24. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114.
  25. Solving inverse problems in medical imaging with score-based generative models. In International Conference on Learning Representations.
  26. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations.
  27. Black-box tuning for language-model-as-a-service. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 20841–20855. PMLR.
  28. Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64–73.
  29. Evil geniuses: Delving into the safety of llm-based agents. arXiv preprint arXiv:2311.11855.
  30. Zi Wang. 2021. Zero-shot knowledge distillation from a decision-based black-box model. In International Conference on Machine Learning, pages 10675–10685. PMLR.
  31. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  32. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yu Tian (249 papers)
  2. Xiao Yang (158 papers)
  3. Yinpeng Dong (102 papers)
  4. Heming Yang (3 papers)
  5. Hang Su (224 papers)
  6. Jun Zhu (424 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.