InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization (2404.04650v1)

Published 6 Apr 2024 in cs.CV

Abstract: Recent strides in the development of diffusion models, exemplified by advancements such as Stable Diffusion, have underscored their remarkable prowess in generating visually compelling images. However, the imperative of achieving a seamless alignment between the generated image and the provided prompt persists as a formidable challenge. This paper traces the root of these difficulties to invalid initial noise, and proposes a solution in the form of Initial Noise Optimization (InitNO), a paradigm that refines this noise. Considering text prompts, not all random noises are effective in synthesizing semantically-faithful images. We design the cross-attention response score and the self-attention conflict score to evaluate the initial noise, bifurcating the initial latent space into valid and invalid sectors. A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions. Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts. Our code is available at https://github.com/xiefan-guo/initno.

References (44)

Citations (15)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/anmorgan2414/status/1799047565037777301

https://twitter.com/CSVisionPapers/status/1777861935704199390

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization (2404.04650v1)

Summary

Related Papers

Tweets