Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Visual Classification using Comparative Descriptors (2411.05357v2)

Published 8 Nov 2024 in cs.CV

Abstract: The performance of vision-LLMs (VLMs), such as CLIP, in visual classification tasks, has been enhanced by leveraging semantic knowledge from LLMs, including GPT. Recent studies have shown that in zero-shot classification tasks, descriptors incorporating additional cues, high-level concepts, or even random characters often outperform those using only the category name. In many classification tasks, while the top-1 accuracy may be relatively low, the top-5 accuracy is often significantly higher. This gap implies that most misclassifications occur among a few similar classes, highlighting the model's difficulty in distinguishing between classes with subtle differences. To address this challenge, we introduce a novel concept of comparative descriptors. These descriptors emphasize the unique features of a target class against its most similar classes, enhancing differentiation. By generating and integrating these comparative descriptors into the classification framework, we refine the semantic focus and improve classification accuracy. An additional filtering process ensures that these descriptors are closer to the image embeddings in the CLIP space, further enhancing performance. Our approach demonstrates improved accuracy and robustness in visual classification tasks by addressing the specific challenge of subtle inter-class differences.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hankyeol Lee (2 papers)
  2. Gawon Seo (1 paper)
  3. Wonseok Choi (7 papers)
  4. Geunyoung Jung (3 papers)
  5. Kyungwoo Song (38 papers)
  6. Jiyoung Jung (6 papers)