Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-Shot Visual Classification with Guided Cropping (2309.06581v1)

Published 12 Sep 2023 in cs.CV

Abstract: Pretrained vision-LLMs, such as CLIP, show promising zero-shot performance across a wide variety of datasets. For closed-set classification tasks, however, there is an inherent limitation: CLIP image encoders are typically designed to extract generic image-level features that summarize superfluous or confounding information for the target tasks. This results in degradation of classification performance, especially when objects of interest cover small areas of input images. In this work, we propose CLIP with Guided Cropping (GC-CLIP), where we use an off-the-shelf zero-shot object detection model in a preprocessing step to increase focus of zero-shot classifier to the object of interest and minimize influence of extraneous image regions. We empirically show that our approach improves zero-shot classification results across architectures and datasets, favorably for small objects.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Piyapat Saranrittichai (6 papers)
  2. Mauricio Munoz (6 papers)
  3. Volker Fischer (23 papers)
  4. Chaithanya Kumar Mummadi (16 papers)
Citations (1)