Papers
Topics
Authors
Recent
2000 character limit reached

On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts

Published 25 Oct 2023 in cs.CR | (2310.16613v2)

Abstract: Malicious or manipulated prompts are known to exploit text-to-image models to generate unsafe images. Existing studies, however, focus on the passive exploitation of such harmful capabilities. In this paper, we investigate the proactive generation of unsafe images from benign prompts (e.g., a photo of a cat) through maliciously modified text-to-image models. Our preliminary investigation demonstrates that poisoning attacks are a viable method to achieve this goal but uncovers significant side effects, where unintended spread to non-targeted prompts compromises attack stealthiness. Root cause analysis identifies conceptual similarity as an important contributing factor to these side effects. To address this, we propose a stealthy poisoning attack method that balances covertness and performance. Our findings highlight the potential risks of adopting text-to-image models in real-world scenarios, thereby calling for future research and safety measures in this space.

Citations (8)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.