Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Information Theoretic Text-to-Image Alignment (2405.20759v1)

Published 31 May 2024 in cs.LG and cs.CV

Abstract: Diffusion models for Text-to-Image (T2I) conditional generation have seen tremendous success recently. Despite their success, accurately capturing user intentions with these models still requires a laborious trial and error process. This challenge is commonly identified as a model alignment problem, an issue that has attracted considerable attention by the research community. Instead of relying on fine-grained linguistic analyses of prompts, human annotation, or auxiliary vision-LLMs to steer image generation, in this work we present a novel method that relies on an information-theoretic alignment measure. In a nutshell, our method uses self-supervised fine-tuning and relies on point-wise mutual information between prompts and images to define a synthetic training set to induce model alignment. Our comparative analysis shows that our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI and a lightweight fine-tuning strategy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chao Wang (555 papers)
  2. Giulio Franzese (18 papers)
  3. Alessandro Finamore (19 papers)
  4. Massimo Gallo (8 papers)
  5. Pietro Michiardi (58 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets