Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners (2305.10722v3)

Published 18 May 2023 in cs.CV

Abstract: Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via efficient attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several benchmark datasets, we demonstrate the potential of using pre-trained diffusion models for discriminative tasks with superior results on few-shot image-text matching.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xuehai He (26 papers)
  2. Weixi Feng (14 papers)
  3. Tsu-Jui Fu (35 papers)
  4. Varun Jampani (125 papers)
  5. Arjun Akula (6 papers)
  6. Pradyumna Narayana (12 papers)
  7. Sugato Basu (16 papers)
  8. William Yang Wang (254 papers)
  9. Xin Eric Wang (74 papers)
Citations (7)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com