Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM (2401.01128v1)

Published 2 Jan 2024 in cs.CV

Abstract: Recently, text-to-image (T2I) synthesis has undergone significant advancements, particularly with the emergence of LLMs (LLM) and their enhancement in Large Vision Models (LVM), greatly enhancing the instruction-following capabilities of traditional T2I models. Nevertheless, previous methods focus on improving generation quality but introduce unsafe factors into prompts. We explore that appending specific camera descriptions to prompts can enhance safety performance. Consequently, we propose a simple and safe prompt engineering method (SSP) to improve image generation quality by providing optimal camera descriptions. Specifically, we create a dataset from multi-datasets as original prompts. To select the optimal camera, we design an optimal camera matching approach and implement a classifier for original prompts capable of automatically matching. Appending camera descriptions to original prompts generates optimized prompts for further LVM image generation. Experiments demonstrate that SSP improves semantic consistency by an average of 16% compared to others and safety metrics by 48.9%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” arXiv, 2021.
  2. “Cogview: Mastering text-to-image generation via transformers,” in NeurIPS, 2021.
  3. “High-resolution image synthesis with latent diffusion models,” in CVPR, 2022.
  4. “Cogview2: Faster and better text-to-image generation via hierarchical transformers,” in NeurIPS.
  5. “Photorealistic text-to-image diffusion models with deep language understanding,” in NeurIPS, 2022.
  6. “Dreamartist: Towards controllable one-shot text-to-image generation via contrastive prompt-tuning,” arXiv, 2022.
  7. “How can we know what language models know?,” TACL, 2020.
  8. “Making pre-trained language models better few-shot learners,” arXiv, 2020.
  9. Jonas Oppenlaender, “A taxonomy of prompt modifiers for text-to-image generation,” BIT, 2023.
  10. “Beautifulprompt: Towards automatic prompt engineering for text-to-image synthesis,” arXiv, 2023.
  11. “Best prompts for text-to-image models and how to find them,” in SIGIR, 2023.
  12. “Promptmagician: Interactive prompt engineering for text-to-image creation,” Trans. Graph., 2023.
  13. “Surrogateprompt: Bypassing the safety filter of text-to-image models via substitution,” arXiv, 2023.
  14. OpenAI, “Gpt-4 technical report,” 2023.
  15. “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv, 2018.
  16. “Microsoft coco: Common objects in context,” in ECCV, 2014.
  17. “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
  18. “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv, 2022.
  19. “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” NeurIPS, 2017.
  20. “Learning transferable visual models from natural language supervision,” in ICML, 2021.
  21. “Detoxify,” Github. https://github.com/unitaryai/detoxify, 2020.
  22. OpenAI, “Chatgpt,” 2023.
  23. “Generative adversarial text to image synthesis,” in ICML, 2016.
  24. “Hierarchical text-conditional image generation with clip latents, 2022,” arXiv, 2022.
  25. “Photorealistic text-to-image diffusion models with deep language understanding,” NeurIPS, 2022.
  26. “Zero-shot text-to-image generation,” in ICML, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Weijin Cheng (1 paper)
  2. Jianzhi Liu (6 papers)
  3. Jiawen Deng (19 papers)
  4. Fuji Ren (18 papers)