Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VersaT2I: Improving Text-to-Image Models with Versatile Reward (2403.18493v1)

Published 27 Mar 2024 in cs.CV

Abstract: Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of any T2I model. We decompose the quality of the image into several aspects such as aesthetics, text-image alignment, geometry, low-level quality, etc. Then, for every quality aspect, we select high-quality images in this aspect generated by the model as the training set to finetune the T2I model using the Low-Rank Adaptation (LoRA). Furthermore, we introduce a gating function to combine multiple quality aspects, which can avoid conflicts between different quality aspects. Our method is easy to extend and does not require any manual annotation, reinforcement learning, or model architecture changes. Extensive experiments demonstrate that VersaT2I outperforms the baseline methods across various quality criteria.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jianshu Guo (3 papers)
  2. Wenhao Chai (50 papers)
  3. Jie Deng (25 papers)
  4. Hsiang-Wei Huang (22 papers)
  5. Tian Ye (65 papers)
  6. Yichen Xu (40 papers)
  7. Jiawei Zhang (529 papers)
  8. Jenq-Neng Hwang (103 papers)
  9. Gaoang Wang (68 papers)
Citations (11)
X Twitter Logo Streamline Icon: https://streamlinehq.com