Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation (2306.08487v2)
Abstract: Zero-Shot Learning (ZSL), which aims at automatically recognizing unseen objects, is a promising learning paradigm to understand new real-world knowledge for machines continuously. Recently, the Knowledge Graph (KG) has been proven as an effective scheme for handling the zero-shot task with large-scale and non-attribute data. Prior studies always embed relationships of seen and unseen objects into visual information from existing knowledge graphs to promote the cognitive ability of the unseen data. Actually, real-world knowledge is naturally formed by multimodal facts. Compared with ordinary structural knowledge from a graph perspective, multimodal KG can provide cognitive systems with fine-grained knowledge. For example, the text description and visual content can depict more critical details of a fact than only depending on knowledge triplets. Unfortunately, this multimodal fine-grained knowledge is largely unexploited due to the bottleneck of feature alignment between different modalities. To that end, we propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings via a designed dense attention module and self-calibration loss. It makes the semantic transfer process of our ZSL framework learns more differentiated knowledge between entities. Our model also gets rid of the performance limitation of only using rough global features. We conduct extensive experiments and evaluate our model on large-scale real-world data. The experimental results clearly demonstrate the effectiveness of the proposed model in standard zero-shot classification tasks.
- Neural machine translation by jointly learning to align and translate. ICLR (2015).
- Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18–42.
- Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5327–5336.
- Predicting visual exemplars of unseen classes for zero-shot learning. In Proceedings of the IEEE international conference on computer vision. 3476–3485.
- Multi-modal Siamese Network for Entity Alignment. In Proc. of KDD.
- Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1043–1052.
- Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 257–266.
- Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844–3852.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
- Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224–2232.
- Devise: A deep visual-semantic embedding model. (2013).
- Yanwei Fu and Leonid Sigal. 2016. Semi-supervised vocabulary-informed learning. In Proceedings of CVPR. 5337–5346.
- OntoZSL: Ontology-enhanced Zero-shot Learning. In Proceedings of the Web Conference 2021. 3325–3336.
- Disentangled Ontology Embedding for Zero-shot Learning. arXiv preprint arXiv:2206.03739 (2022).
- A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2. IEEE, 729–734.
- Inductive representation learning on large graphs. In NIPS. 1024–1034.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015).
- Dat Huynh and Ehsan Elhamifar. 2020a. Compositional zero-shot learning via fine-grained dense feature composition. Advances in Neural Information Processing Systems 33 (2020), 19849–19860.
- Dat Huynh and Ehsan Elhamifar. 2020b. Fine-grained generalized zero-shot learning via dense attribute-based attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4483–4493.
- Rethinking knowledge graph propagation for zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11487–11496.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. ICLR (2017).
- Attribute Propagation Network for Graph Zero-shot Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4868–4875.
- Attribute Propagation Network for Graph Zero-Shot Learning. AAAI (2020).
- Hyperbolic visual embedding learning for zero-shot recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9273–9281.
- Yu Liu and Tinne Tuytelaars. 2020. A Deep Multi-Modal Explanation Model for Zero-Shot Learning. IEEE Transactions on Image Processing (2020).
- Disentangled Action Recognition with Knowledge Bases. In Proceedings of the 2022 NAACL. 559–572.
- James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281–297.
- George A Miller. 1998. WordNet: An electronic lexical database. MIT press.
- Zero-shot learning by convex combination of semantic embeddings. ICLR (2014).
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
- Graph Neural News Recommendation with User Existing and Potential Interest Modeling. ACM Trans. Knowl. Discov. Data 16, 5 (2022), 96:1–96:17.
- The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2008), 61–80.
- Zero-shot learning through cross-modal transfer. NIPS (2014).
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
- Graph attention networks. ICLR (2018).
- Jin Wang and Bo Jiang. 2021. Zero-Shot Learning via Contrastive Learning on Dual Knowledge Graphs. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 885–892.
- Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6857–6866.
- Zero-shot Node Classification with Decomposed Graph Prototype Network. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1769–1779.
- Semantic guided knowledge graph for large-scale zero-shot learning. Journal of Visual Communication and Image Representation 88 (2022), 103629.
- Learning the Implicit Semantic Representation on Graph-Structured Data. arXiv preprint arXiv:2101.06471 (2021).
- Estimating Early Fundraising Performance of Innovations via Graph-Based Market Environment Model.. In AAAI. 6396–6403.
- Learning the Explainable Semantic Relations via Unified Graph Topic-Disentangled Neural Networks. ACM Transactions on Knowledge Discovery from Data (2023).
- A Survey on Large Language Models for Recommendation. arXiv preprint arXiv:2305.19860 (2023).
- Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence 41, 9 (2018).
- Attentive region embedding network for zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9384–9393.
- Region graph embedding network for zero-shot learning. In European conference on computer vision. Springer, 562–580.
- VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning. In Proceedings of the CVPR. 9316–9325.
- Semantics-Preserving Graph Propagation for Zero-Shot Object Detection. IEEE Transactions on Image Processing (2020).
- Exemplar-Based, Semantic Guided Zero-Shot Visual Recognition. IEEE Transactions on Image Processing 31 (2022), 3056–3065.
- Interaction-aware drug package recommendation via policy gradient. ACM Transactions on Information Systems 41, 1 (2023), 1–32.
- Likang Wu (25 papers)
- Zhi Li (275 papers)
- Hongke Zhao (24 papers)
- Zhefeng Wang (39 papers)
- Qi Liu (485 papers)
- Baoxing Huai (28 papers)
- Nicholas Jing Yuan (22 papers)
- Enhong Chen (242 papers)