Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models (2303.06571v2)

Published 12 Mar 2023 in cs.CV

Abstract: Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models. Though effective, it is particularly problematic in the few-shot scenario, where prompt tuning performance is sensitive to the initialization and requires a time-consuming process to find a good initialization, thus restricting the fast adaptation ability of the pre-training models. In addition, prompt tuning could undermine the generalizability of the pre-training models, because the learnable prompt tokens are easy to overfit to the limited training samples. To address these issues, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM) framework that jointly meta-learns an efficient soft prompt initialization for better adaptation and a lightweight gradient regulating function for strong cross-domain generalizability in a meta-learning paradigm using only the unlabeled image-text pre-training data. Rather than designing a specific prompt tuning method, our GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way, and comprehensive experiments show that GRAM brings about consistent improvement for them in several settings (i.e., few-shot learning, cross-domain generalization, cross-dataset generalization, etc.) over 11 datasets. Further, experiments show that GRAM enables the orthogonal methods of textual and visual prompt tuning to work in a mutually-enhanced way, offering better generalizability beyond the uni-modal prompt tuning methods.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (10)

Juncheng Li (121 papers)
Minghe Gao (12 papers)
Longhui Wei (40 papers)
Siliang Tang (116 papers)
Wenqiao Zhang (51 papers)
Mengze Li (22 papers)
Wei Ji (202 papers)
Qi Tian (314 papers)
Tat-Seng Chua (359 papers)
Yueting Zhuang (164 papers)

Citations (14)

View on Semantic Scholar

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models (2303.06571v2)

Related Papers