Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning (2402.17251v1)
Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object pairs based on a limited set of observed examples. Current CZSL methodologies, despite their advancements, tend to neglect the distinct specificity levels present in attributes. For instance, given images of sliced strawberries, they may fail to prioritize Sliced-Strawberry' over a generic
Red-Strawberry', despite the former being more informative. They also suffer from ballooning search space when shifting from Close-World (CW) to Open-World (OW) CZSL. To address the issues, we introduce the Context-based and Diversity-driven Specificity learning framework for CZSL (CDS-CZSL). Our framework evaluates the specificity of attributes by considering the diversity of objects they apply to and their related context. This novel approach allows for more accurate predictions by emphasizing specific attribute-object pairs and improves composition filtering in OW-CZSL. We conduct experiments in both CW and OW scenarios, and our model achieves state-of-the-art results across three datasets.
- On leveraging variational graph embeddings for open world compositional zero-shot learning. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4645–4654, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332, 2021.
- Learning attention as disentangler for compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15315–15324, 2023.
- Detecting human-object interaction via fabricated compositional learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14646–14655, 2021.
- Discovering states and transformations in image collections. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1383–1391, 2015.
- Revisiting visual product for compositional zero-shot learning. In NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021.
- Kg-sp: Knowledge guided simple primitives for open world compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9336–9345, 2022.
- Compositional learning for human object interaction. In Proceedings of the European Conference on Computer Vision (ECCV), pages 234–251, 2018.
- Hierarchical visual primitive experts for compositional zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5675–5685, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Brenden M Lake. Towards more human-like concept learning in machines: Compositionality, causality, and learning-to-learn. PhD thesis, Massachusetts Institute of Technology, 2014.
- Siamese contrastive embedding network for compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9326–9335, 2022.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Distilled reverse attention network for open-world compositional zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1782–1791, 2023.
- Symmetry and group in attribute-object compositions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11316–11325, 2020.
- Rethink, revisit, revise: A spiral reinforced self-revised network for zero-shot learning. arXiv preprint arXiv:2112.00410, 2021.
- Simple primitives with feasibility- and contextuality-dependence for open-world compositional zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–18, 2023.
- Decomposed soft prompt guided fusion enhancing for compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23560–23569, 2023.
- Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. arXiv preprint arXiv:2106.04489, 2021.
- Open world compositional zero-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5222–5230, 2021.
- Learning graph embeddings for open world compositional zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- From red wine to red tomato: Composition with context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1792–1801, 2017.
- Learning graph embeddings for compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 953–962, 2021.
- 3d compositional zero-shot learning with decompositional consensus. In European Conference on Computer Vision, pages 713–730. Springer, 2022.
- Attributes as operators: factorizing unseen attribute-object compositions. In Proceedings of the European Conference on Computer Vision (ECCV), pages 169–185, 2018.
- Learning to compose soft prompts for compositional zero-shot learning. In International Conference on Learning Representations, 2023.
- Clip-guided vision-language pre-training for question answering in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5606–5611, 2023.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Task-driven modular networks for zero-shot compositional learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3593–3602, 2019.
- Improving language understanding by generative pre-training. 2018.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30, 2017.
- Efficient parametrization of multi-domain deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8119–8127, 2018.
- Disentangling visual embeddings for attributes and objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13658–13667, 2022.
- Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Hierarchical prompt learning for compositional zero-shot recognition. IJCAI, 2023a.
- Learning conditional attributes for compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11197–11206, 2023b.
- Adversarial fine-grained composition learning for unseen attribute-object recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3741–3749, 2019.
- Zero-shot compositional concept learning. arXiv preprint arXiv:2107.05176, 2021a.
- Relation-aware compositional zero-shot learning for attribute-object pair recognition. IEEE Transactions on Multimedia, 2021b.
- A decomposable causal view of compositional zero-shot learning. IEEE Transactions on Multimedia, 2022.
- Distinctive image captioning via clip guided group optimization. In European Conference on Computer Vision, pages 223–238. Springer, 2022.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.