Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models (2303.17169v1)

Published 30 Mar 2023 in cs.CV

Abstract: Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-LLMs to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an appropriate prompt for each specific task. Recent CoCoOp further boosts the base-to-new generalization performance via an image-conditional prompt. However, it directly fuses identical image semantics to prompts of different labels and significantly weakens the discrimination among different classes as shown in our experiments. Motivated by this observation, we first propose a class-aware text prompt (CTP) to enrich generated prompts with label-related image information. Unlike CoCoOp, CTP can effectively involve image semantics and avoid introducing extra ambiguities into different prompts. On the other hand, instead of reserving the complete image representations, we propose text-guided feature tuning (TFT) to make the image branch attend to class-related representation. A contrastive loss is employed to align such augmented text and image representations on downstream tasks. In this way, the image-to-text CTP and text-to-image TFT can be mutually promoted to enhance the adaptation of VLMs for downstream tasks. Extensive experiments demonstrate that our method outperforms the existing methods by a significant margin. Especially, compared to CoCoOp, we achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Sifan Long (11 papers)
  2. Zhen Zhao (85 papers)
  3. Junkun Yuan (19 papers)
  4. Zichang Tan (25 papers)
  5. Jiangjiang Liu (6 papers)
  6. Luping Zhou (72 papers)
  7. Shengsheng Wang (14 papers)
  8. Jingdong Wang (236 papers)
Citations (2)