Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text as Image: Learning Transferable Adapter for Multi-Label Classification (2312.04160v1)

Published 7 Dec 2023 in cs.CV

Abstract: Pre-trained vision-LLMs have notably accelerated progress of open-world concept recognition. Their impressive zero-shot ability has recently been transferred to multi-label image classification via prompt tuning, enabling to discover novel labels in an open-vocabulary manner. However, this paradigm suffers from non-trivial training costs, and becomes computationally prohibitive for a large number of candidate labels. To address this issue, we note that vision-language pre-training aligns images and texts in a unified embedding space, making it potential for an adapter network to identify labels in visual modality while be trained in text modality. To enhance such cross-modal transfer ability, a simple yet effective method termed random perturbation is proposed, which enables the adapter to search for potential visual embeddings by perturbing text embeddings with noise during training, resulting in better performance in visual modality. Furthermore, we introduce an effective approach to employ LLMs for multi-label instruction-following text generation. In this way, a fully automated pipeline for visual label recognition is developed without relying on any manual data. Extensive experiments on public benchmarks show the superiority of our method in various multi-label classification tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Xuelin Zhu (8 papers)
  2. Jiuxin Cao (18 papers)
  3. Dongqi Tang (9 papers)
  4. Furong Xu (22 papers)
  5. Weijia Liu (9 papers)
  6. Jiawei Ge (15 papers)
  7. Bo Liu (484 papers)
  8. Qingpei Guo (27 papers)
  9. Tianyi Zhang (262 papers)
  10. Jian Liu (404 papers)
Citations (1)