Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution (2309.14122v3)

Published 25 Sep 2023 in cs.CV and cs.CR

Abstract: Advanced text-to-image models such as DALL$\cdot$E 2 and Midjourney possess the capacity to generate highly realistic images, raising significant concerns regarding the potential proliferation of unsafe content. This includes adult, violent, or deceptive imagery of political figures. Despite claims of rigorous safety mechanisms implemented in these models to restrict the generation of not-safe-for-work (NSFW) content, we successfully devise and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant photorealistic NSFW images. We reveal the fundamental principles of such prompt attacks and suggest strategically substituting high-risk sections within a suspect prompt to evade closed-source safety measures. Our novel framework, SurrogatePrompt, systematically generates attack prompts, utilizing LLMs, image-to-text, and image-to-image modules to automate attack prompt creation at scale. Evaluation results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts, leading to the generation of counterfeit images depicting political figures in violent scenarios. Both subjective and objective assessments validate that the images generated from our attack prompts present considerable safety hazards.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhongjie Ba (22 papers)
  2. Jieming Zhong (2 papers)
  3. Jiachen Lei (4 papers)
  4. Peng Cheng (229 papers)
  5. Qinglong Wang (18 papers)
  6. Zhan Qin (54 papers)
  7. Kui Ren (169 papers)
  8. ZhiBo Wang (48 papers)
Citations (13)
X Twitter Logo Streamline Icon: https://streamlinehq.com