Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Universal Prompt Optimizer for Safe Text-to-Image Generation (2402.10882v6)

Published 16 Feb 2024 in cs.CV and cs.CL

Abstract: Text-to-Image (T2I) models have shown great performance in generating images based on textual prompts. However, these models are vulnerable to unsafe input to generate unsafe content like sexual, harassment and illegal-activity images. Existing studies based on image checker, model fine-tuning and embedding blocking are impractical in real-world applications. Hence, we propose the first universal prompt optimizer for safe T2I (POSI) generation in black-box scenario. We first construct a dataset consisting of toxic-clean prompt pairs by GPT-3.5 Turbo. To guide the optimizer to have the ability of converting toxic prompt to clean prompt while preserving semantic information, we design a novel reward function measuring toxicity and text alignment of generated images and train the optimizer through Proximal Policy Optimization. Experiments show that our approach can effectively reduce the likelihood of various T2I models in generating inappropriate images, with no significant impact on text alignment. It is also flexible to be combined with methods to achieve better performance. Our code is available at https://github.com/wu-zongyu/POSI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf.
  2. Threat model-agnostic adversarial defense using diffusion models. arXiv preprint arXiv:2207.08089.
  3. Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. arXiv preprint arXiv:2309.06135.
  4. Black-box prompt learning for pre-trained language models. Transactions on Machine Learning Research.
  5. On adversarial examples for character-level neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, pages 653–663.
  6. Erasing concepts from diffusion models. In Processings of the IEEE/CVF International Conference on Computer Vision, ICCV 2023, pages 2426–2436.
  7. Evaluating the robustness of text-to-image diffusion models against real-world attacks. CoRR, abs/2306.13103.
  8. Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: bert-based adversarial examples for text classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pages 6174–6181.
  9. Generative adversarial networks. Communications of the ACM, 63(11):139–144.
  10. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022.
  11. Optimizing prompts for text-to-image generation. CoRR.
  12. Lora: Low-rank adaptation of large language models. In Proceedings of Tenth International Conference on Learning Representations, ICLR 2022.
  13. Visual prompt tuning. In Proceedings of ECCV 2022, pages 709–727.
  14. Diederik P. Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1x1 convolutions. In NeurIPS 2018, pages 10236–10245.
  15. Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of 2nd International Conference on Learning Representations, ICLR 2014.
  16. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of International Conference on Machine Learning, ICML 2022, pages 12888–12900.
  17. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of International Conference on Machine Learning, ICML 2022, pages 16784–16804.
  18. Training language models to follow instructions with human feedback. In NeurIPS 2022.
  19. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952.
  20. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, pages 3403–3417.
  21. Learning transferable visual models from natural language supervision. In Proceedings of International conference on machine learning, ICML 2021, pages 8748–8763.
  22. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, page 3.
  23. Zero-shot text-to-image generation. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, pages 8821–8831.
  24. Red-teaming the stable diffusion safety filter. CoRR, abs/2210.04610.
  25. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, pages 10674–10685.
  26. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS 2022.
  27. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, pages 22522–22531.
  28. Can machines help us answering question 16 in datasheets, and in turn reflecting on inappropriate content? In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pages 1350–1361.
  29. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  30. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  31. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, NIPS 2016, pages 4790–4798.
  32. De-diffusion makes text a strong cross-modal interface. CoRR, abs/2311.00618.
  33. Exploring the universal vulnerability of prompt-based learning paradigm. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1799–1810.
  34. Generating natural adversarial examples. arXiv preprint arXiv:1710.11342.
Citations (4)

Summary

We haven't generated a summary for this paper yet.