Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models (2306.09869v3)

Published 16 Jun 2023 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors. Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder. Then, we obtain the gradient of the log posterior of context vectors, which can be updated and transferred to the subsequent cross-attention layer, thereby implicitly minimizing a nested hierarchy of energy functions. Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts. Using extensive experiments, we demonstrate that the proposed method is highly effective in handling various image generation tasks, including multi-concept generation, text-guided image inpainting, and real and synthetic image editing. Code: https://github.com/EnergyAttention/Energy-Based-CrossAttention.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Geon Yeong Park (14 papers)
  2. Jeongsol Kim (15 papers)
  3. Beomsu Kim (28 papers)
  4. Sang Wan Lee (14 papers)
  5. Jong Chul Ye (210 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.