Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation (2402.12100v1)

Published 19 Feb 2024 in cs.CL, cs.AI, cs.CR, and cs.SE

Abstract: With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.

References (27)

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (8)

Tweets

https://twitter.com/ComputerPapers/status/1759981161516794108

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation (2402.12100v1)

Summary

Follow-up Questions

Related Papers

Authors (8)

Tweets