Automatic Jailbreaking of the Text-to-Image Generative AI Systems (2405.16567v2)
Abstract: Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on LLMs. At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12% and 17% of the attacks with naive prompts, respectively, while ChatGPT blocks 84% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts to maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the ChatGPT with 11.0% block rate, making it generate copyrighted contents in 76% of the time. Finally, we explore various defense strategies, such as post-generation filtering and machine unlearning techniques, but found that they were inadequate, which suggests the necessity of stronger defense mechanisms.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Improving image generation with better captions. In Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2023.
- Synthetic data generators–sequential and private. Advances in Neural Information Processing Systems, 2020.
- Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5253–5270, 2023.
- Emerging properties in self-supervised vision transformers. In IEEE International Conference on Computer Vision, pages 9650–9660, 2021.
- An economic solution to copyright challenges of generative ai. arXiv preprint arXiv:2404.13964, 2024.
- Gill Dennis. Getty images v stability ai: copyright claims can proceed to trial. Out-law, Dec 2023. URL https://www.pinsentmasons.com/out-law/news/getty-images-v-stability-ai.
- Can copyright be reduced to privacy? arXiv preprint arXiv:2305.14822, 2023.
- Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
- Erasing concepts from diffusion models. In IEEE International Conference on Computer Vision, 2023.
- Cpr: Retrieval augmented generation for copyright protection. arXiv preprint arXiv:2403.18920, 2024.
- Cueto Law Group. 4 types of intellectual property rights protection (definitions & examples), 2021. URL https://cuetolawgroup.com/intellectual-property-rights/.
- The times sues openai and microsoft over a.i. use of copyrighted work. The New York Times, Dec 2023. URL https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html.
- Ablating concepts in text-to-image diffusion models. In IEEE International Conference on Computer Vision, 2023.
- Cornell Law School Legal Information Institute. 17 u.s. code § 106 - exclusive rights in copyrighted works, 2022. URL https://www.law.cornell.edu/uscode/text/17/106.
- Visual instruction tuning. Advances in Neural Information Processing Systems, 2024.
- Black box adversarial prompting for foundation models. arXiv preprint arXiv:2302.04237, 2023.
- Microsoft. Microsoft copilot. http://copilot.microsoft.com, 2024. AI-powered assistant.
- MidJourney. Midjourney. https://www.midjourney.com, 2024. AI-powered image generation tool.
- U.S. Copyright Office. What visual and graphic artists should know about copyright, 2023. URL https://www.copyright.gov/engage/visual-artists/.
- OpenAI. Chatgpt. https://chat.openai.com/, 2024. May 20 version.
- Copyright basics, 2024. URL https://www.uspto.gov/ip-policy/copyright-policy/copyright-basics.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 2021.
- We’ve filed law suits challenging ai image generators for using artists’ work without consent, credit, or compensation. because ai needs to be fair & ethical for everyone. https://imagegeneratorlitigation.com, 2023.
- Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- Glaze: Protecting artists from style mimicry by text-to-image models. 32nd USENIX Security Symposium (USENIX Security 23), pages 2187–2204, 2023.
- Diffusion art or digital forgery? investigating data replication in diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, 2023a.
- Understanding and mitigating copying in diffusion models. In Advances in Neural Information Processing Systems, 2023b.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- On provable copyright protection for generative models. In International Conference on Machine Learning. PMLR, 2023.
- Diagnosis: Detecting unauthorized data usages in text-to-image diffusion models. In International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=f8S3aLm0Vp.
- Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, 2024.
- Detecting, explaining, and mitigating memorization in diffusion models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=84n3UwkH7b.
- Large language models as optimizers. International Conference on Learning Representations, 2024a.
- Sneakyprompt: Evaluating robustness of text-to-image generative models’ safety filters. IEEE symposium on security and privacy (sp), 2024b.
- Discovering universal semantic triggers for text-to-image synthesis. arXiv preprint arXiv:2402.07562, 2024.
- Investigating copyright issues of diffusion models under practical scenarios. arXiv preprint arXiv:2311.12803, 2023.
- Copyright protection and accountability of generative ai: Attack, watermarking and attribution. In Companion Proceedings of the ACM Web Conference 2023, pages 94–98, 2023.
- © plug-in authorization for human content copyright protection in text-to-image model. In ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024.
- Minseon Kim (18 papers)
- Hyomin Lee (3 papers)
- Boqing Gong (100 papers)
- Huishuai Zhang (64 papers)
- Sung Ju Hwang (178 papers)