Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Jailbreaking of the Text-to-Image Generative AI Systems (2405.16567v2)

Published 26 May 2024 in cs.AI and cs.CR

Abstract: Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on LLMs. At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12% and 17% of the attacks with naive prompts, respectively, while ChatGPT blocks 84% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts to maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the ChatGPT with 11.0% block rate, making it generate copyrighted contents in 76% of the time. Finally, we explore various defense strategies, such as post-generation filtering and machine unlearning techniques, but found that they were inadequate, which suggests the necessity of stronger defense mechanisms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Improving image generation with better captions. In Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2023.
  3. Synthetic data generators–sequential and private. Advances in Neural Information Processing Systems, 2020.
  4. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5253–5270, 2023.
  5. Emerging properties in self-supervised vision transformers. In IEEE International Conference on Computer Vision, pages 9650–9660, 2021.
  6. An economic solution to copyright challenges of generative ai. arXiv preprint arXiv:2404.13964, 2024.
  7. Gill Dennis. Getty images v stability ai: copyright claims can proceed to trial. Out-law, Dec 2023. URL https://www.pinsentmasons.com/out-law/news/getty-images-v-stability-ai.
  8. Can copyright be reduced to privacy? arXiv preprint arXiv:2305.14822, 2023.
  9. Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
  10. Erasing concepts from diffusion models. In IEEE International Conference on Computer Vision, 2023.
  11. Cpr: Retrieval augmented generation for copyright protection. arXiv preprint arXiv:2403.18920, 2024.
  12. Cueto Law Group. 4 types of intellectual property rights protection (definitions & examples), 2021. URL https://cuetolawgroup.com/intellectual-property-rights/.
  13. The times sues openai and microsoft over a.i. use of copyrighted work. The New York Times, Dec 2023. URL https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html.
  14. Ablating concepts in text-to-image diffusion models. In IEEE International Conference on Computer Vision, 2023.
  15. Cornell Law School Legal Information Institute. 17 u.s. code § 106 - exclusive rights in copyrighted works, 2022. URL https://www.law.cornell.edu/uscode/text/17/106.
  16. Visual instruction tuning. Advances in Neural Information Processing Systems, 2024.
  17. Black box adversarial prompting for foundation models. arXiv preprint arXiv:2302.04237, 2023.
  18. Microsoft. Microsoft copilot. http://copilot.microsoft.com, 2024. AI-powered assistant.
  19. MidJourney. Midjourney. https://www.midjourney.com, 2024. AI-powered image generation tool.
  20. U.S. Copyright Office. What visual and graphic artists should know about copyright, 2023. URL https://www.copyright.gov/engage/visual-artists/.
  21. OpenAI. Chatgpt. https://chat.openai.com/, 2024. May 20 version.
  22. Copyright basics, 2024. URL https://www.uspto.gov/ip-policy/copyright-policy/copyright-basics.
  23. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 2021.
  24. We’ve filed law suits challenging ai image generators for using artists’ work without consent, credit, or compensation. because ai needs to be fair & ethical for everyone. https://imagegeneratorlitigation.com, 2023.
  25. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  26. Glaze: Protecting artists from style mimicry by text-to-image models. 32nd USENIX Security Symposium (USENIX Security 23), pages 2187–2204, 2023.
  27. Diffusion art or digital forgery? investigating data replication in diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, 2023a.
  28. Understanding and mitigating copying in diffusion models. In Advances in Neural Information Processing Systems, 2023b.
  29. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  30. On provable copyright protection for generative models. In International Conference on Machine Learning. PMLR, 2023.
  31. Diagnosis: Detecting unauthorized data usages in text-to-image diffusion models. In International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=f8S3aLm0Vp.
  32. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, 2024.
  33. Detecting, explaining, and mitigating memorization in diffusion models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=84n3UwkH7b.
  34. Large language models as optimizers. International Conference on Learning Representations, 2024a.
  35. Sneakyprompt: Evaluating robustness of text-to-image generative models’ safety filters. IEEE symposium on security and privacy (sp), 2024b.
  36. Discovering universal semantic triggers for text-to-image synthesis. arXiv preprint arXiv:2402.07562, 2024.
  37. Investigating copyright issues of diffusion models under practical scenarios. arXiv preprint arXiv:2311.12803, 2023.
  38. Copyright protection and accountability of generative ai: Attack, watermarking and attribution. In Companion Proceedings of the ACM Web Conference 2023, pages 94–98, 2023.
  39. © plug-in authorization for human content copyright protection in text-to-image model. In ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Minseon Kim (18 papers)
  2. Hyomin Lee (3 papers)
  3. Boqing Gong (100 papers)
  4. Huishuai Zhang (64 papers)
  5. Sung Ju Hwang (178 papers)
Citations (8)

Summary

  • The paper demonstrates that current T2I systems can be breached using an Automated Prompt Generation Pipeline, reducing ChatGPT’s block rate from 84% to 11%.
  • It empirically compares systems like ChatGPT, Copilot, and Gemini, revealing significant disparities in their abilities to block copyright-infringing prompts.
  • It underscores the urgent need for more robust, adaptive safety mechanisms to mitigate risks of unauthorized copyright reproduction.

Automatic Jailbreaking of Text-to-Image Generative AI Systems: A Critical Exposition

The paper, "Automatic Jailbreaking of the Text-to-Image Generative AI Systems," presents an analytical paper targeting the vulnerabilities of contemporary text-to-image (T2I) generative AI systems concerning copyright infringement. The paper elucidates the capacity of these systems to bypass internal safety mechanisms meant to prevent unauthorized reproduction of copyrighted materials, highlighting a significant concern given the proliferation of such technologies in commercial applications.

Key Aspects of the Study

The paper emphasizes two pivotal components:

  1. Evaluation of Current T2I Systems: The paper examines commercial T2I systems such as ChatGPT, Copilot, and Gemini with respect to their ability to block simple but potentially infringing prompts. The empirical findings show a stark disparity in the efficacy of these mechanisms, with ChatGPT outperforming others by blocking around 84% of infringing prompts, while Copilot and Gemini performed significantly worse, blocking only 12% and 17% of such prompts, respectively.
  2. Development of an Automated Jailbreaking Pipeline: In response to the inadequacies observed in existing systems, the authors propose an Automated Prompt Generation Pipeline (APGP), a sophisticated framework aimed at optimizing prompt construction to systematically evade detection mechanisms. This approach leverages LLMs to generate high-risk prompts that maximize the chance of evoking copyright-violating outputs without requiring model updates or gradient computation.

Results and Implications

The paper’s experimental results underscore a critical gap in the defenses of current T2I systems. The authors successfully demonstrate a drastic reduction in ChatGPT’s block rate to 11% using their APGP-generated prompts. A salient revelation from this examination is that ChatGPT's supposedly robust defenses succumb to producing infringing content in 76% of the cases presented, challenging the presumptuous adequacy of existing "safety" mechanisms.

The work boldly asserts that T2I systems, despite rigorous internal safeguards and alignment protocols, remain susceptible to meticulously crafted prompts. This revelation casts significant doubt on the reliability of current safety protocols embedded within these AI models—especially those deployed in commercial environments where compliance with copyright laws is paramount.

Theoretical and Practical Implications

From a theoretical standpoint, the paper signals a compelling need for more robust and adaptive defense mechanisms that can withstand the evolving sophistication of prompt engineering practices. The findings suggest that the integration of stronger, context-aware, and possibly real-time analytical frameworks is essential in mitigating the risks of unauthorized content reproduction.

Practically, the paper highlights a pressing need for AI developers and service providers to revamp their current methodologies involved in the red-teaming and scrutinizing of T2I systems. The ease with which the APGP framework bypasses existing security measures indicates a vulnerability that could expose companies to substantial legal and ethical liabilities associated with intellectual property (IP) infringements.

Future Directions

Looking forward, the research community is likely to concentrate on augmenting defenses within T2I frameworks by leveraging more granular content-filtering systems and employing models that can dynamically adapt to new linguistic patterns indicative of potential violations. Additionally, enhancing transparency in AI model operations and the datasets underpinning these generative processes could facilitate better oversight and accountability, thus fostering trust and safety in AI applications.

Overall, this paper serves as a critical reminder of the dynamic interplay between technological advancements in AI and the corresponding policy and enforcement frameworks needed to ensure ethical compliance and protection of IP rights in the digital age.

Youtube Logo Streamline Icon: https://streamlinehq.com