Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning (2307.03591v1)

Published 6 Jul 2023 in cs.AI and cs.IR

Abstract: Multimodal knowledge graphs (MKGs), which intuitively organize information in various modalities, can benefit multiple practical downstream tasks, such as recommendation systems, and visual question answering. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial architectures, the pretrained transformer models have drawn increasing attention, especially for multimodal scenarios. However, the research of multimodal pretrained transformer (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multimodal data, the rich structural information underlying the MKG still cannot be fully leveraged in existing MPT models. Most of them only utilize the graph structure as a retrieval map for matching images and texts connected with the same entity. This manner hinders their reasoning performances. To this end, we propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two different strategies, i.e., weighted summation and alignment constraint, is first designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6077–6086.
  2. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423–443.
  3. BEiT: BERT Pre-Training of Image Transformers. In International Conference on Learning Representations. https://openreview.net/forum?id=p-BhZSz59o4
  4. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08). Association for Computing Machinery, New York, NY, USA, 1247–1250. https://doi.org/10.1145/1376616.1376746
  5. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).
  6. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  7. HittER: Hierarchical Transformers for Knowledge Graph Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
  8. Hybrid Transformer with Multi-Level Fusion for Multimodal Knowledge Graph Completion. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 904–915. https://doi.org/10.1145/3477495.3531992
  9. Uniter: Universal image-text representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX. Springer, 104–120.
  10. Neural Compositional Rule Learning for Knowledge Graph Reasoning. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=F8VKQyDgRVj
  11. Imagined Visual Representations as Multimodal Embeddings. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI’17). AAAI Press, 4378–4384.
  12. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  13. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
  14. Utilizing Knowledge Graphs for Text-Centric Information Retrieval. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR ’18). Association for Computing Machinery, New York, NY, USA, 1387–1390. https://doi.org/10.1145/3209978.3210187
  15. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://doi.org/10.48550/ARXIV.2010.11929
  16. Whitening for self-supervised representation learning. In International Conference on Machine Learning. PMLR, 3015–3024.
  17. DeViSE: A Deep Visual-Semantic Embedding Model. In Advances in Neural Information Processing Systems, C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2013/file/7cce53cf90577442771720a370c3c723-Paper.pdf
  18. NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=xMJWUKJnFSw
  19. Fashionbert: Text and image matching with adaptive loss for cross-modal retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2251–2260.
  20. Transformer-based Entity Typing in Knowledge Graphs. (2022).
  21. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR ’18). Association for Computing Machinery, New York, NY, USA, 505–514. https://doi.org/10.1145/3209978.3210017
  22. Drew A. Hudson and Christopher D. Manning. 2019. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  23. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  24. Text Generation from Knowledge Graphs with Graph Transformers. In 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (ACL), 2284–2293.
  25. Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11336–11344.
  26. KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 857–867.
  27. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019).
  28. House: Knowledge graph embedding with householder parameterization. In International Conference on Machine Learning. PMLR, 13209–13224.
  29. Knowledge Graph Contrastive Learning Based on Relation-Symmetrical Structure. IEEE Transactions on Knowledge and Data Engineering (2023).
  30. A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal. https://doi.org/10.48550/ARXIV.2212.05767
  31. Message Intercommunication for Inductive Relation Reasoning. arXiv preprint arXiv:2305.14074 (2023).
  32. Abslearn: a gnn-based framework for aliasing and buffer-size information retrieval. Pattern Analysis and Applications (2023), 1–19.
  33. Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1120–1130.
  34. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32 (2019).
  35. Normalizing flow-based neural process for few-shot knowledge graph completion. In The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.
  36. Sarf: Aliasing relation assisted self-supervised learning for few-shot relation reasoning. arXiv preprint arXiv:2304.10297 (2023).
  37. George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM 38, 11 (nov 1995), 39–41. https://doi.org/10.1145/219717.219748
  38. A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics, New Orleans, Louisiana, 225–234. https://doi.org/10.18653/v1/S18-2027
  39. Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv preprint arxiv:306.08302 (2023).
  40. Embedding multimodal relational data for knowledge base completion. arXiv preprint arXiv:1809.01341 (2018).
  41. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  42. Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593–607.
  43. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019).
  44. Graph Hawkes Transformer for Extrapolated Reasoning on Temporal Knowledge Graphs. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 7481–7493.
  45. Hao Tan and Mohit Bansal. 2019. Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019).
  46. Inductive Relation Prediction by Subgraph Reasoning. ICML (2020).
  47. Large-scale representation learning on graphs via bootstrapping. arXiv preprint arXiv:2102.06514 (2021).
  48. Towards holistic concept representations: Embedding relational knowledge, visual attributes, and distributional word semantics. In The Semantic Web–ISWC 2017: 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Proceedings, Part I 16. Springer, 694–710.
  49. Complex embeddings for simple link prediction. In International conference on machine learning. PMLR, 2071–2080.
  50. Composition-based multi-relational graph convolutional networks. arXiv preprint arXiv:1911.03082 (2019).
  51. Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective. In Proceedings of the 29th ACM International Conference on Multimedia (Virtual Event, China) (MM ’21). Association for Computing Machinery, New York, NY, USA, 2735–2743. https://doi.org/10.1145/3474085.3475470
  52. Multimodal Data Enhanced Representation Learning for Knowledge Graphs. In 2019 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN.2019.8852079
  53. Representation Learning of Knowledge Graphs with Entity Descriptions. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (Phoenix, Arizona) (AAAI’16). AAAI Press, 2659–2665.
  54. Image-Embodied Knowledge Representation Learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI’17). AAAI Press, 3140–3146.
  55. Relation-Enhanced Negative Sampling for Multimodal Knowledge Graph Completion. In Proceedings of the 30th ACM International Conference on Multimedia (Lisboa, Portugal) (MM ’22). Association for Computing Machinery, New York, NY, USA, 3857–3866. https://doi.org/10.1145/3503161.3548388
  56. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014).
  57. Zuoxi Yang. 2020. Biomedical Information Retrieval Incorporating Knowledge Graph for Explainable Precision Medicine. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 2486. https://doi.org/10.1145/3397271.3401458
  58. KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:1909.03193 (2019).
  59. Learning to Walk with Dual Agents for Knowledge Graph Reasoning. arXiv preprint arXiv:2112.12876 (2021).
  60. AliCG: Fine-Grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 3895–3905. https://doi.org/10.1145/3447548.3467057
  61. Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction. In Thirty-Fourth AAAI Conference on Artificial Intelligence. AAAI Press, 3065–3072.
  62. MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 10527–10536. https://aclanthology.org/2022.emnlp-main.719
  63. MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning. ArXiv abs/2209.01416 (2022).
  64. Multimodal joint attribute prediction and value extraction for e-commerce product. arXiv preprint arXiv:2009.07162 (2020).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ke Liang (41 papers)
  2. Sihang Zhou (37 papers)
  3. Yue Liu (257 papers)
  4. Lingyuan Meng (7 papers)
  5. Meng Liu (112 papers)
  6. Xinwang Liu (101 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.