Zero-Shot Relational Learning for Multimodal Knowledge Graphs (2404.06220v2)
Abstract: Relational learning is an essential task in the domain of knowledge representation, particularly in knowledge graph completion (KGC). While relational learning in traditional single-modal settings has been extensively studied, exploring it within a multimodal KGC context presents distinct challenges and opportunities. One of the major challenges is inference on newly discovered relations without any associated training data. This zero-shot relational learning scenario poses unique requirements for multimodal KGC, i.e., utilizing multimodality to facilitate relational learning.However, existing works fail to support the leverage of multimodal information and leave the problem unexplored. In this paper, we propose a novel end-to-end framework, consisting of three components, i.e., multimodal learner, structure consolidator, and relation embedding generator, to integrate diverse multimodal information and knowledge graph structures to facilitate the zero-shot relational learning. Evaluation results on three multimodal knowledge graphs demonstrate the superior performance of our proposed method.
- Named Entity Extraction for Knowledge Graphs: A Literature Overview. IEEE Access 8 (2020), 32862–32881. https://doi.org/10.1109/ACCESS.2020.2973928
- Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214–223.
- Dbpedia: A nucleus for a web of open data. In international semantic web conference. Springer, 722–735.
- Learning to extrapolate knowledge: Transductive few-shot out-of-graph link prediction. Advances in Neural Information Processing Systems 33 (2020), 546–560.
- TuckER: Tensor Factorization for Knowledge Graph Completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).
- Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 1247–1250.
- Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).
- Multimodal named entity recognition with image attributes and image knowledge. In Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part II 26. Springer, 186–201.
- Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 904–915.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Learning Meta-Representations of One-shot Relations for Temporal Knowledge Graph Link Prediction. In 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–10.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Multimodal masked autoencoders learn transferable representations. arXiv preprint arXiv:2205.14204 (2022).
- Ontozsl: Ontology-enhanced zero-shot learning. In Proceedings of the Web Conference 2021. 3325–3336.
- Contrastive audio-visual masked autoencoder. In The Eleventh International Conference on Learning Representations.
- Generative adversarial nets. Advances in neural information processing systems 27 (2014).
- A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering 34, 8 (2020), 3549–3568.
- Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000–16009.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
- Graphmae: Self-supervised masked graph autoencoders. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 594–604.
- Endowing language models with multimodal knowledge graph representations. arXiv preprint arXiv:2206.13163 (2022).
- Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers). 687–696.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- IMF: Interactive Multimodal Fusion Model for Link Prediction. In Proceedings of the ACM Web Conference 2023. 2572–2580.
- MMKG: multi-modal knowledge graphs. In The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16. Springer, 459–474.
- Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).
- Krisp: Integrating implicit and symbolic knowledge for open-domain knowledge-based vqa. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14111–14121.
- Learning attention-based embeddings for relation prediction in knowledge graphs. arXiv preprint arXiv:1906.01195 (2019).
- Generative adversarial zero-shot relational learning for knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8673–8680.
- G. Salton. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage 24 (1988).
- Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer, 593–607.
- A decade of knowledge graphs in natural language processing: A survey. arXiv preprint arXiv:2210.00105 (2022).
- Adaptive attentional network for few-shot knowledge graph completion. arXiv preprint arXiv:2010.09638 (2020).
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
- Ontology-guided and text-enhanced representation for knowledge graph zero-shot relational learning. In ICLR 2022 Workshop on Deep Learning on Graphs for Natural Language Processing.
- A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. Comput. Surveys (2023).
- Mgae: Masked autoencoders for self-supervised learning on graphs. arXiv preprint arXiv:2201.02534 (2022).
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- One-shot learning for long-tail visual relation detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12225–12232.
- Multimodal data enhanced representation learning for knowledge graphs. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
- Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI conference on artificial intelligence, Vol. 28.
- Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence 41, 9 (2018), 2251–2265.
- Image-embodied knowledge representation learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press.
- One-shot relational learning for knowledge graphs. arXiv preprint arXiv:1808.09040 (2018).
- Cycle representation learning for inductive relation prediction. In International Conference on Machine Learning. PMLR, 24895–24910.
- Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- Multimodal contrastive training for visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6995–7004.
- Few-shot knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 3041–3048.
- MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
- A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1004–1013.