Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
24 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
35 tokens/sec
2000 character limit reached

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation (2402.12100v1)

Published 19 Feb 2024 in cs.CL, cs.AI, cs.CR, and cs.SE

Abstract: With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. 2023. ChatGPT. https://chat.openai.com/. (2023).
  2. 2023. A Comprehensive Overview of Large Language Models. arXiv (2023). https://arxiv.org/abs/2307.06435.
  3. 2023. Content policy | DALL·E. https://labs.openai.com/policies/content-policy. (2023).
  4. 2023. DALL·E 3. https://openai.com/dall-e-3. (2023).
  5. 2023. Groot. https://sites.google.com/view/text-to-image-testing. (2023).
  6. 2023. Midjourney. https://www.midjourney.com/. (2023).
  7. 2023. Midjourney. https://www.midjourney.com/home. (2023).
  8. 2023. Reddit - Dive into anything. https://www.reddit.com/r/ChatGPT/comments/11vlp7j/nsfwgpt_that_nsfw_prompt/. (2023).
  9. 2023. Shader - Wikipedia. https://en.wikipedia.org/wiki/Shader. (2023).
  10. 2023. Stable Diffusion — Stability AI. https://stability.ai/stable-diffusion. (2023).
  11. 2023. Vertex AI. https://cloud.google.com/vertex-ai. (2023).
  12. MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots. (2023). arXiv:cs.CR/2307.08715
  13. Siddhant Garg and Goutham Ramakrishnan. 2020a. BAE: BERT-based Adversarial Examples for Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 6174–6181. https://doi.org/10.18653/v1/2020.emnlp-main.498
  14. Siddhant Garg and Goutham Ramakrishnan. 2020b. BAE: BERT-based Adversarial Examples for Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 6174–6181. https://doi.org/10.18653/v1/2020.emnlp-main.498
  15. Ming Jiang and Jana Diesner. 2019. A Constituency Parsing Tree based Method for Relation Extraction from. EMNLP-IJCNLP 2019 (2019), 186.
  16. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. (2020). arXiv:cs.CL/1907.11932
  17. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Proceedings of the AAAI Conference on Artificial Intelligence 34, 05 (Apr. 2020), 8018–8025. https://doi.org/10.1609/aaai.v34i05.6311
  18. TextBugger: Generating Adversarial Text Against Real-world Applications. In Proceedings 2019 Network and Distributed System Security Symposium (NDSS 2019). Internet Society. https://doi.org/10.14722/ndss.2019.23138
  19. TextBugger: Generating Adversarial Text Against Real-world Applications. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society. https://www.ndss-symposium.org/ndss-paper/textbugger-generating-adversarial-text-against-real-world-applications/
  20. Translation with source constituency and dependency trees. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1066–1076.
  21. Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models. (2023). arXiv:cs.CV/2305.13873
  22. Red-Teaming the Stable Diffusion Safety Filter. (2022). arXiv:cs.AI/2210.04610
  23. A Survey on Techniques in NLP. International Journal of Computer Applications 134, 8 (2016), 6–9.
  24. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.
  25. SneakyPrompt: Jailbreaking Text-to-image Generative Models. (2023). arXiv:cs.LG/2305.12082
  26. Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts. arXiv preprint arXiv:2309.10253 (2023).
  27. JADE: A Linguistics-based Safety Evaluation Platform for LLM. arXiv preprint arXiv:2311.00286 (2023).
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com