Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Paragraph-to-Image Generation with Information-Enriched Diffusion Model (2311.14284v2)

Published 24 Nov 2023 in cs.CV

Abstract: Text-to-image (T2I) models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of LLMs to the task of image generation. At its core is using a LLM (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to alignthe text-image feature spaces in the generation task. To facilitate the training of long-text semantic alignment, we also curated a high-quality paragraph-image pair dataset, namely ParaImage. This dataset contains a small amount of high-quality, meticulously annotated data, and a large-scale synthetic dataset with long text descriptions being generated using a vision-LLM. Experiments demonstrate that ParaDiffusion outperforms state-of-the-art models (SD XL, DeepFloyd IF) on ViLG-300 and ParaPrompts, achieving up to 15% and 45% human voting rate improvements for visual appeal and text faithfulness, respectively. The code and dataset will be released to foster community research on long-text alignment.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Weijia Wu (47 papers)
  2. Zhuang Li (69 papers)
  3. Yefei He (19 papers)
  4. Mike Zheng Shou (165 papers)
  5. Chunhua Shen (404 papers)
  6. Lele Cheng (6 papers)
  7. Yan Li (505 papers)
  8. Tingting Gao (25 papers)
  9. Di Zhang (230 papers)
  10. Zhongyuan Wang (105 papers)
Citations (17)