Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 194 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation (2402.13697v1)

Published 21 Feb 2024 in cs.CV

Abstract: Zero-shot Panoptic Segmentation (ZPS) aims to recognize foreground instances and background stuff without images containing unseen categories in training. Due to the visual data sparsity and the difficulty of generalizing from seen to unseen categories, this task remains challenging. To better generalize to unseen classes, we propose Conditional tOken aligNment and Cycle trAnsiTion (CONCAT), to produce generalizable semantic vision queries. First, a feature extractor is trained by CON to link the vision and semantics for providing target queries. Formally, CON is proposed to align the semantic queries with the CLIP visual CLS token extracted from complete and masked images. To address the lack of unseen categories, a generator is required. However, one of the gaps in synthesizing pseudo vision queries, ie, vision queries for unseen categories, is describing fine-grained visual details through semantic embeddings. Therefore, we approach CAT to train the generator in semantic-vision and vision-semantic manners. In semantic-vision, visual query contrast is proposed to model the high granularity of vision by pulling the pseudo vision queries with the corresponding targets containing segments while pushing those without segments away. To ensure the generated queries retain semantic information, in vision-semantic, the pseudo vision queries are mapped back to semantic and supervised by real semantic embeddings. Experiments on ZPS achieve a 5.2% hPQ increase surpassing SOTA. We also examine inductive ZPS and open-vocabulary semantic segmentation and obtain comparative results while being 2 times faster in testing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.