Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation (2407.08268v1)

Published 11 Jul 2024 in cs.CV

Abstract: CLIP, as a vision-LLM, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects.Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods.The code are made publicly available at: https://github.com/leaves162/CLIPtrase.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tong Shao (3 papers)
  2. Zhuotao Tian (38 papers)
  3. Hang Zhao (156 papers)
  4. Jingyong Su (16 papers)
Citations (9)
X Twitter Logo Streamline Icon: https://streamlinehq.com