Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 152 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 94 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Disentangling CLIP for Multi-Object Perception (2502.02977v3)

Published 5 Feb 2025 in cs.CV

Abstract: Vision-LLMs like CLIP excel at recognizing the single, prominent object in a scene. However, they struggle in complex scenes containing multiple objects. We identify a fundamental reason behind this limitation: VLMs features space exhibits significant semantic entanglement, where features of one class contain substantial information about other unrelated classes, a phenomenon we term mutual feature information (MFI). This entanglement becomes evident during class-specific queries, as unrelated objects are activated alongside the queried class. To address this limitation, we propose DCLIP, a framework that disentangles CLIP features using two complementary objectives: a novel MFI Loss that orthogonalizes the text (class) features to reduce inter-class similarity, and the Asymmetric Loss (ASL) that aligns image features with the disentangled text features. Our experiment demonstrates that DCLIP reduces inter-class feature similarity by 30\% compared to CLIP, leading to significant performance gains on multi-label recognition (MLR) and zero-shot semantic segmentation (ZS3). In MLR, DCLIP outperforms SOTA approaches on VOC2007 and COCO-14 while using 75\% fewer parameters, and surpasses SOTA ZS3 methods by 3.4 mIoU on VOC2012 and 2.8 mIoU on COCO-17. These results establish feature disentanglement as a critical factor for effective multi-object perception in vision-LLMs.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube