Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation (2404.09468v2)
Abstract: Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs, collaboratively leveraging structural information from the triples and multi-modal information of the entities to overcome the inherent incompleteness. Existing MMKGC methods usually extract multi-modal features with pre-trained models, resulting in coarse handling of multi-modal entity information, overlooking the nuanced, fine-grained semantic details and their complex interactions. To tackle this shortfall, we introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities and enhance the MMKGC performance. Motivated by the tokenization technology, MyGO tokenizes multi-modal entity information as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. To further augment the multi-modal representations, MyGO incorporates fine-grained contrastive learning to highlight the specificity of the entity representations. Experiments on standard MMKGC benchmarks reveal that our method surpasses 19 of the latest models, underlining its superior performance. Code and data can be found in https://github.com/zjukg/MyGO
- TuckER: Tensor Factorization for Knowledge Graph Completion. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 5184–5193.
- Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787–2795.
- OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport. In NeurIPS.
- PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. In Proc. of ACL.
- MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid. In ACM Multimedia. ACM, 3317–3327.
- The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework. arXiv:2403.06832 [cs.CL]
- Tele-Knowledge Pre-training for Fault Analysis. In ICDE. IEEE, 3453–3466.
- Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey. CoRR abs/2402.05391 (2024).
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171–4186.
- Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering. CoRR abs/2402.12728 (2024).
- Taming Transformers for High-Resolution Image Synthesis. In CVPR. Computer Vision Foundation / IEEE, 12873–12883.
- Philip Gage. 1994. A new algorithm for data compression. The C Users Journal 12, 2 (1994), 23–38.
- SimCSE: Simple Contrastive Learning of Sentence Embeddings. In EMNLP (1). Association for Computational Linguistics, 6894–6910.
- OpenKE: An Open Toolkit for Knowledge Embedding. In Proc. of EMNLP.
- Knowledge Graph Embedding via Dynamic Mapping Matrix. In ACL (1). The Association for Computer Linguistics, 687–696.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).
- Taku Kudo. 2018. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. In ACL (1). Association for Computational Linguistics, 66–75.
- VISTA: Visual-Textual Knowledge Graph Representation Learning. In EMNLP (Findings). Association for Computational Linguistics, 7314–7328.
- DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web (2015).
- IMF: Interactive Multimodal Fusion Model for Link Prediction. In WWW. ACM, 2572–2580.
- MMKG: Multi-modal Knowledge Graphs. In ESWC (Lecture Notes in Computer Science, Vol. 11503). Springer, 459–474.
- MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning. Appl. Intell. 52, 7 (2022), 7480–7497.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024–8035.
- BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers. CoRR abs/2208.06366 (2022).
- TokenLearner: Adaptive Space-Time Tokenization for Videos. In NeurIPS. 12786–12797.
- A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning. In *SEM@NAACL-HLT. Association for Computational Linguistics, 225–234.
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.
- Multi-modal Knowledge Graphs for Recommender Systems. In CIKM. ACM, 1405–1414.
- RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In ICLR (Poster). OpenReview.net.
- Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding. In Proc. of ACL.
- Complex Embeddings for Simple Link Prediction. In ICML (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 2071–2080.
- Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018).
- Neural Discrete Representation Learning. In NIPS. 6306–6315.
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
- Attention is All you Need. In NIPS. 5998–6008.
- Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
- Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective. In ACM Multimedia. ACM, 2735–2743.
- TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio. In ACM Multimedia. ACM, 2391–2399.
- Multimodal Data Enhanced Representation Learning for Knowledge Graphs. In IJCNN. IEEE, 1–8.
- W. John Wilbur and Karl Sirotkin. 1992. The automatic identification of stop words. J. Inf. Sci. 18, 1 (1992), 45–55.
- Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. CoRR abs/1609.08144 (2016).
- Image-embodied Knowledge Representation Learning. In IJCAI. ijcai.org, 3140–3146.
- Relation-enhanced Negative Sampling for Multimodal Knowledge Graph Completion. In ACM Multimedia. ACM, 3857–3866.
- Multimodal Biological Knowledge Graph Completion via Triple Co-Attention Mechanism. In ICDE. IEEE, 3928–3941.
- Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In ICLR (Poster).
- KG-BERT: BERT for Knowledge Graph Completion. CoRR abs/1909.03193 (2019).
- Modality-Aware Negative Sampling for Multi-modal Knowledge Graph Embedding. In IJCNN. IEEE, 1–8.
- NativE: Multi-modal Knowledge Graph Completion in the Wild. Authorea Preprints (2024).
- Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion. CoRR abs/2402.15444 (2024).
- MACO: A Modality Adversarial and Contrastive Framework for Modality-Missing Multi-modal Knowledge Graph Completion. In NLPCC (1) (Lecture Notes in Computer Science, Vol. 14302). Springer, 123–134.
- Yichi Zhang and Wen Zhang. 2022. Knowledge Graph Completion with Pre-trained Multimodal Transformer and Twins Negative Sampling. CoRR abs/2209.07084 (2022).
- MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion. In EMNLP. Association for Computational Linguistics, 10527–10536.
- Knowledge Perceived Multi-modal Pretraining in E-commerce. In ACM Multimedia. ACM, 2744–2752.
- Yichi Zhang (185 papers)
- Zhuo Chen (319 papers)
- Lingbing Guo (27 papers)
- Yajing Xu (17 papers)
- Binbin Hu (42 papers)
- Ziqi Liu (78 papers)
- Huajun Chen (199 papers)
- Wen Zhang (170 papers)