Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance (2210.16031v3)

Published 28 Oct 2022 in cs.CV and cs.CL

Abstract: Diffusion generative models have recently greatly improved the power of text-conditioned image generation. Existing image generation models mainly include text conditional diffusion model and cross-modal guided diffusion model, which are good at small scene image generation and complex scene image generation respectively. In this work, we propose a simple yet effective approach, namely UPainting, to unify simple and complex scene image generation, as shown in Figure 1. Based on architecture improvements and diverse guidance schedules, UPainting effectively integrates cross-modal guidance from a pretrained image-text matching model into a text conditional diffusion model that utilizes a pretrained Transformer LLM as the text encoder. Our key findings is that combining the power of large-scale Transformer LLM in understanding language and image-text matching model in capturing cross-modal semantics and style, is effective to improve sample fidelity and image-text alignment of image generation. In this way, UPainting has a more general image generation capability, which can generate images of both simple and complex scenes more effectively. To comprehensively compare text-to-image models, we further create a more general benchmark, UniBench, with well-written Chinese and English prompts in both simple and complex scenes. We compare UPainting with recent models and find that UPainting greatly outperforms other models in terms of caption similarity and image fidelity in both simple and complex scenes. UPainting project page \url{https://upainting.github.io/}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Wei Li (1121 papers)
  2. Xue Xu (4 papers)
  3. Xinyan Xiao (41 papers)
  4. Jiachen Liu (45 papers)
  5. Hu Yang (19 papers)
  6. Guohao Li (43 papers)
  7. Zhanpeng Wang (4 papers)
  8. Zhifan Feng (7 papers)
  9. Qiaoqiao She (9 papers)
  10. Yajuan Lyu (16 papers)
  11. Hua Wu (191 papers)
Citations (28)