Training-Free Unsupervised Prompt for Vision-Language Models (2404.16339v1)

Published 25 Apr 2024 in cs.CV and cs.AI

Abstract: Prompt learning has become the most effective paradigm for adapting large pre-trained vision-LLMs (VLMs) to downstream tasks. Recently, unsupervised prompt tuning methods, such as UPL and POUF, directly leverage pseudo-labels as supervisory information to fine-tune additional adaptation modules on unlabeled data. However, inaccurate pseudo labels easily misguide the tuning process and result in poor representation capabilities. In light of this, we propose Training-Free Unsupervised Prompts (TFUP), which maximally preserves the inherent representation capabilities and enhances them with a residual connection to similarity-based prediction probabilities in a training-free and labeling-free manner. Specifically, we integrate both instance confidence and prototype scores to select representative samples, which are used to customize a reliable Feature Cache Model (FCM) for training-free inference. Then, we design a Multi-level Similarity Measure (MSM) that considers both feature-level and semantic-level similarities to calculate the distance between each test image and the cached sample as the weight of the corresponding cached label to generate similarity-based prediction probabilities. In this way, TFUP achieves surprising performance, even surpassing the training-base method on multiple classification datasets. Based on our TFUP, we propose a training-based approach (TFUP-T) to further boost the adaptation performance. In addition to the standard cross-entropy loss, TFUP-T adopts an additional marginal distribution entropy loss to constrain the model from a global perspective. Our TFUP-T achieves new state-of-the-art classification performance compared to unsupervised and few-shot adaptation approaches on multiple benchmarks. In particular, TFUP-T improves the classification accuracy of POUF by 3.3% on the most challenging Domain-Net dataset.

References (46)

Authors (7)

Sifan Long (11 papers)
Linbin Wang (3 papers)
Zhen Zhao (85 papers)
Zichang Tan (25 papers)
Yiming Wu (31 papers)
Shengsheng Wang (14 papers)
Jingdong Wang (236 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/realmofresearch/status/1784591325259800948

Training-Free Unsupervised Prompt for Vision-Language Models (2404.16339v1)

Summary

Related Papers

Tweets