Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning (2211.05077v1)

Published 9 Nov 2022 in cs.CV

Abstract: This work explores the zero-shot compositional learning ability of large pre-trained vision-LLMs(VLMs) within the prompt-based learning framework and propose a model (\textit{PromptCompVL}) to solve the compositonal zero-shot learning (CZSL) problem. \textit{PromptCompVL} makes two design choices: first, it uses a soft-prompting instead of hard-prompting to inject learnable parameters to reprogram VLMs for compositional learning. Second, to address the compositional challenge, it uses the soft-embedding layer to learn primitive concepts in different combinations. By combining both soft-embedding and soft-prompting, \textit{PromptCompVL} achieves state-of-the-art performance on the MIT-States dataset. Furthermore, our proposed model achieves consistent improvement compared to other CLIP-based methods which shows the effectiveness of the proposed prompting strategies for CZSL.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (3)

Guangyue Xu (4 papers)
Parisa Kordjamshidi (44 papers)
Joyce Chai (52 papers)

Citations (9)

View on Semantic Scholar

Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning (2211.05077v1)

Related Papers