Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels? (2307.11978v1)

Published 22 Jul 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Vision-LLMs such as CLIP learn a generic text-image embedding from large-scale training data. A vision-LLM can be adapted to a new classification task through few-shot prompt tuning. We find that such a prompt tuning process is highly robust to label noises. This intrigues us to study the key reasons contributing to the robustness of the prompt tuning paradigm. We conducted extensive experiments to explore this property and find the key factors are: 1) the fixed classname tokens provide a strong regularization to the optimization of the model, reducing gradients induced by the noisy samples; 2) the powerful pre-trained image-text embedding that is learned from diverse and generic web data provides strong prior knowledge for image classification. Further, we demonstrate that noisy zero-shot predictions from CLIP can be used to tune its own prompt, significantly enhancing prediction accuracy in the unsupervised setting. The code is available at https://github.com/CEWu/PTNL.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Cheng-En Wu (9 papers)
  2. Yu Tian (249 papers)
  3. Haichao Yu (11 papers)
  4. Heng Wang (136 papers)
  5. Pedro Morgado (21 papers)
  6. Yu Hen Hu (15 papers)
  7. Linjie Yang (48 papers)
Citations (11)