Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CountCLIP -- [Re] Teaching CLIP to Count to Ten (2406.03586v2)

Published 5 Jun 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Large vision-LLMs (VLMs) are shown to learn rich joint image-text representations enabling high performances in relevant downstream tasks. However, they fail to showcase their quantitative understanding of objects, and they lack good counting-aware representation. This paper conducts a reproducibility study of 'Teaching CLIP to Count to Ten' (Paiss et al., 2023), which presents a method to finetune a CLIP model (Radford et al., 2021) to improve zero-shot counting accuracy in an image while maintaining the performance for zero-shot classification by introducing a counting-contrastive loss term. We improve the model's performance on a smaller subset of their training data with lower computational resources. We verify these claims by reproducing their study with our own code. The implementation can be found at https://github.com/SforAiDl/CountCLIP.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Harshvardhan Mestha (3 papers)
  2. Karan Bania (5 papers)
  3. Shreyas V (4 papers)
  4. Yash Bhisikar (2 papers)
  5. Tejas Agrawal (1 paper)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com