Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation (2306.11087v1)
Abstract: We study universal zero-shot segmentation in this work to achieve panoptic, instance, and semantic segmentation for novel categories without any training samples. Such zero-shot segmentation ability relies on inter-class relationships in semantic space to transfer the visual knowledge learned from seen categories to unseen ones. Thus, it is desired to well bridge semantic-visual spaces and apply the semantic relationships to visual feature learning. We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces as well as addresses the issue of lack of unseen training data. Furthermore, to mitigate the domain gap between semantic and visual spaces, firstly, we enhance the vanilla generator with learned primitives, each of which contains fine-grained attributes related to categories, and synthesize unseen features by selectively assembling these primitives. Secondly, we propose to disentangle the visual feature into the semantic-related part and the semantic-unrelated part that contains useful visual classification clues but is less relevant to semantic representation. The inter-class relationships of semantic-related visual features are then required to be aligned with those in semantic space, thereby transferring semantic knowledge to visual feature learning. The proposed approach achieves impressively state-of-the-art performance on zero-shot panoptic segmentation, instance segmentation, and semantic segmentation. Code is available at https://henghuiding.github.io/PADing/.
- Evaluation of output embeddings for fine-grained image classification. In CVPR, 2015.
- Zero-shot object detection. In ECCV, 2018.
- Autoencoder based novelty detection for generalized zero shot learning. In ICIP. IEEE, 2019.
- Zero-shot semantic segmentation. NeurIPS, 32, 2019.
- Classifier and exemplar synthesis for zero-shot learning. IJCV, 128(1), 2020.
- An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In ECCV. Springer, 2016.
- Weak-shot fine-grained classification via similarity transfer. NeurIPS, 2021.
- Weak-shot semantic segmentation via dual similarity transfer. NeurIPS, 2022.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI, 40(4), 2017.
- Semantics disentangling for generalized zero-shot learning. In ICCV, 2021.
- Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
- Sign: Spatial-information incorporated generative network for generalized zero-shot semantic segmentation. In ICCV, pages 9556–9566, October 2021.
- Zero-shot image recognition using relational matching, adaptation and calibration. In IJCNN. IEEE, 2019.
- Boundary-aware feature propagation for scene segmentation. In ICCV, 2019.
- Context contrasted feature and gated multi-scale aggregation for scene segmentation. In CVPR, 2018.
- Semantic correlation promoted shape-variant context for segmentation. In CVPR, 2019.
- Semantic segmentation with context encoding and multi-path decoding. IEEE TIP, 29, 2020.
- Vision-language transformer and query generation for referring segmentation. In ICCV, 2021.
- VLT: Vision-language transformer and query generation for referring segmentation. IEEE TPAMI, 2022.
- Decoupling zero-shot semantic segmentation. In CVPR, 2022.
- Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv:1412.6568, 2014.
- Federated incremental semantic segmentation. In CVPR, 2023.
- The pascal visual object classes challenge: A retrospective. IJCV, 111(1), 2015.
- Generalised zero-shot learning with domain classification in a joint semantic and visual space. In DICTA. IEEE, 2019.
- Generative adversarial networks. Communications of the ACM, 63(11), 2020.
- Open-vocabulary object detection via vision and language knowledge distillation. arXiv preprint arXiv:2104.13921, 2021.
- Context-aware feature generation for zero-shot semantic segmentation. In ACM MM, 2020.
- Dual-view ranking with hardness assessment for zero-shot learning. In AAAI, 2019.
- Mask r-cnn. In ICCV, 2017.
- Deep residual learning for image recognition. In CVPR, 2016.
- Semantic-promoted debiasing and background disambiguation for zero-shot instance segmentation. In CVPR, 2023.
- Uncertainty-aware learning for zero-shot semantic segmentation. In NeurIPS, 2020.
- Fine-grained generalized zero-shot learning via dense attribute-based attention. In CVPR, 2020.
- Open-vocabulary instance segmentation via robust cross-modal pseudo-labeling. In CVPR, 2022.
- Online incremental attribute-based zero-shot learning. In CVPR. IEEE, 2012.
- Panoptic feature pyramid networks. In CVPR, 2019.
- Panoptic segmentation. In CVPR, 2019.
- Learning to detect unseen object classes by between-class attribute transfer. In CVPR, 2009.
- Diverse image-to-image translation via disentangled representations. In ECCV, 2018.
- Consistent structural relation learning for zero-shot segmentation. NeurIPS, 33, 2020.
- Transformer-based visual segmentation: A survey. arXiv:2304.09854, 2023.
- Deep semantic structural constraints for zero-shot learning. In AAAI, 2018.
- Generative moment matching networks. In ICML, 2015.
- Microsoft coco: Common objects in context. In ECCV. Springer, 2014.
- GRES: Generalized referring expression segmentation. In CVPR, 2023.
- Instance-specific feature propagation for referring segmentation. IEEE TMM, 2022.
- Few-shot segmentation with optimal transport matching and message flow. IEEE TMM, 2022.
- Fully convolutional networks for semantic segmentation. In CVPR, 2015.
- Learning unbiased zero-shot semantic segmentation networks via transductive transfer. IEEE SPL, 27, 2020.
- Distributed representations of words and phrases and their compositionality. In NeurIPS, 2013.
- Zero-shot learning with semantic output codes. NeurIPS, 22, 2009.
- A closer look at self-training for zero-label semantic segmentation. In CVPRW, 2021.
- A review of generalized zero-shot learning methods. arXiv preprint arXiv:2011.08641, 2020.
- Learning transferable visual models from natural language supervision. In ICML. PMLR, 2021.
- Toward open set recognition. IEEE TPAMI, 35(7), 2012.
- Toward achieving robust low-level and high-level scene parsing. IEEE TIP, 2019.
- Hierarchical disentanglement of discriminative latent features for zero-shot learning. In CVPR, 2019.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. 9(11), 2008.
- Attention is all you need. In NeurIPS, 2017.
- Leveraging seen and unseen semantic relationships for generative zero-shot learning. In ECCV. Springer, 2020.
- Solo: Segmenting objects by locations. In ECCV. Springer, 2020.
- Semantic projection network for zero-and few-label semantic segmentation. In CVPR, 2019.
- A simple baseline for open-vocabulary semantic segmentation with pre-trained vision-language model. In ECCV. Springer, 2022.
- Designing category-level attributes for discriminative visual recognition. In CVPR, 2013.
- Open-vocabulary object detection using captions. In CVPR, 2021.
- Prototypical matching and open set rejection for zero-shot semantic segmentation. In ICCV, 2021.
- Zero-shot learning via joint latent similarity embedding. In CVPR, 2016.
- Zero-shot instance segmentation. In CVPR, 2021.