Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification (2309.06809v1)

Published 13 Sep 2023 in cs.CV

Abstract: Vision and LLMs (VLMs), such as CLIP, have enabled visual recognition of a potentially unlimited set of categories described by text prompts. However, for the best visual recognition performance, these models still require tuning to better fit the data distributions of the downstream tasks, in order to overcome the domain shift from the web-based pre-training data. Recently, it has been shown that it is possible to effectively tune VLMs without any paired data, and in particular to effectively improve VLMs visual recognition performance using text-only training data generated by LLMs. In this paper, we dive deeper into this exciting text-only VLM training approach and explore ways it can be significantly further improved taking the specifics of the downstream task into account when sampling text data from LLMs. In particular, compared to the SOTA text-only VLM training approach, we demonstrate up to 8.4% performance improvement in (cross) domain-specific adaptation, up to 8.7% improvement in fine-grained recognition, and 3.1% overall average improvement in zero-shot classification compared to strong baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. M. Jehanzeb Mirza (15 papers)
  2. Leonid Karlinsky (79 papers)
  3. Wei Lin (207 papers)
  4. Horst Possegger (35 papers)
  5. Rogerio Feris (105 papers)
  6. Horst Bischof (53 papers)
Citations (6)