Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering (2401.06345v1)

Published 12 Jan 2024 in cs.CV

Abstract: The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images. Although performs well for simple texts, the models may get confused when faced with complex texts that contain multiple objects or spatial relationships. To get the desired images, a feasible way is to manually adjust the textual descriptions, i.e., narrating the texts or adding some words, which is labor-consuming. In this paper, we propose a framework to learn the proper textual descriptions for diffusion models through prompt learning. By utilizing the quality guidance and the semantic guidance derived from the pre-trained diffusion model, our method can effectively learn the prompts to improve the matches between the input text and the generated images. Extensive experiments and analyses have validated the effectiveness of the proposed method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chang Yu (47 papers)
  2. Junran Peng (30 papers)
  3. Xiangyu Zhu (85 papers)
  4. Zhaoxiang Zhang (161 papers)
  5. Qi Tian (314 papers)
  6. Zhen Lei (205 papers)
Citations (2)
Youtube Logo Streamline Icon: https://streamlinehq.com