Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis (2311.17126v1)

Published 28 Nov 2023 in cs.CV and cs.CL

Abstract: Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts. Despite the advancement, these diffusion models sometimes struggle to translate the semantic content from the text into images entirely. While conditioning on the layout has shown to be effective in improving the compositional ability of T2I diffusion models, they typically require manual layout input. In this work, we introduce a novel approach to improving T2I diffusion models using LLMs as layout generators. Our method leverages the Chain-of-Thought prompting of LLMs to interpret text and generate spatially reasonable object layouts. The generated layout is then used to enhance the generated images' composition and spatial accuracy. Moreover, we propose an efficient adapter based on a cross-attention mechanism, which explicitly integrates the layout information into the stable diffusion models. Our experiments demonstrate significant improvements in image quality and layout accuracy, showcasing the potential of LLMs in augmenting generative image models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xiaohui Chen (73 papers)
  2. Yongfei Liu (25 papers)
  3. Yingxiang Yang (14 papers)
  4. Jianbo Yuan (33 papers)
  5. Quanzeng You (41 papers)
  6. Li-Ping Liu (27 papers)
  7. Hongxia Yang (130 papers)
Citations (9)