Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation (2405.06948v1)

Published 11 May 2024 in cs.CV

Abstract: Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these limitations, we propose a subject-driven generation framework and introduce training-free guidance to intervene in the generative process during inference time. This approach strengthens the attention map, allowing for precise attribute binding and feature injection for each subject. Notably, our method exhibits exceptional zero-shot generation ability, especially in the challenging task of compositional generation. Furthermore, we propose a novel metric GroundingScore to evaluate subject alignment thoroughly. The obtained quantitative results serve as compelling evidence showcasing the effectiveness of our proposed method. The code will be released soon.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Bo Wang (823 papers)
  2. Ye Ma (19 papers)
  3. Te Yang (3 papers)
  4. Xipeng Cao (3 papers)
  5. Quan Chen (91 papers)
  6. Han Li (182 papers)
  7. Di Dong (4 papers)
  8. Peng Jiang (274 papers)
  9. ShengYuan Liu (9 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com