Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation (2404.09468v2)

Published 15 Apr 2024 in cs.AI

Abstract: Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs, collaboratively leveraging structural information from the triples and multi-modal information of the entities to overcome the inherent incompleteness. Existing MMKGC methods usually extract multi-modal features with pre-trained models, resulting in coarse handling of multi-modal entity information, overlooking the nuanced, fine-grained semantic details and their complex interactions. To tackle this shortfall, we introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities and enhance the MMKGC performance. Motivated by the tokenization technology, MyGO tokenizes multi-modal entity information as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. To further augment the multi-modal representations, MyGO incorporates fine-grained contrastive learning to highlight the specificity of the entity representations. Experiments on standard MMKGC benchmarks reveal that our method surpasses 19 of the latest models, underlining its superior performance. Code and data can be found in https://github.com/zjukg/MyGO

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. TuckER: Tensor Factorization for Knowledge Graph Completion. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 5184–5193.
  2. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787–2795.
  3. OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport. In NeurIPS.
  4. PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. In Proc. of ACL.
  5. MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid. In ACM Multimedia. ACM, 3317–3327.
  6. The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework. arXiv:2403.06832 [cs.CL]
  7. Tele-Knowledge Pre-training for Fault Analysis. In ICDE. IEEE, 3453–3466.
  8. Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey. CoRR abs/2402.05391 (2024).
  9. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171–4186.
  10. Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering. CoRR abs/2402.12728 (2024).
  11. Taming Transformers for High-Resolution Image Synthesis. In CVPR. Computer Vision Foundation / IEEE, 12873–12883.
  12. Philip Gage. 1994. A new algorithm for data compression. The C Users Journal 12, 2 (1994), 23–38.
  13. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In EMNLP (1). Association for Computational Linguistics, 6894–6910.
  14. OpenKE: An Open Toolkit for Knowledge Embedding. In Proc. of EMNLP.
  15. Knowledge Graph Embedding via Dynamic Mapping Matrix. In ACL (1). The Association for Computer Linguistics, 687–696.
  16. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).
  17. Taku Kudo. 2018. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. In ACL (1). Association for Computational Linguistics, 66–75.
  18. VISTA: Visual-Textual Knowledge Graph Representation Learning. In EMNLP (Findings). Association for Computational Linguistics, 7314–7328.
  19. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web (2015).
  20. IMF: Interactive Multimodal Fusion Model for Link Prediction. In WWW. ACM, 2572–2580.
  21. MMKG: Multi-modal Knowledge Graphs. In ESWC (Lecture Notes in Computer Science, Vol. 11503). Springer, 459–474.
  22. MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning. Appl. Intell. 52, 7 (2022), 7480–7497.
  23. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024–8035.
  24. BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers. CoRR abs/2208.06366 (2022).
  25. TokenLearner: Adaptive Space-Time Tokenization for Videos. In NeurIPS. 12786–12797.
  26. A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning. In *SEM@NAACL-HLT. Association for Computational Linguistics, 225–234.
  27. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.
  28. Multi-modal Knowledge Graphs for Recommender Systems. In CIKM. ACM, 1405–1414.
  29. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In ICLR (Poster). OpenReview.net.
  30. Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding. In Proc. of ACL.
  31. Complex Embeddings for Simple Link Prediction. In ICML (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 2071–2080.
  32. Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018).
  33. Neural Discrete Representation Learning. In NIPS. 6306–6315.
  34. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
  35. Attention is All you Need. In NIPS. 5998–6008.
  36. Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
  37. Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective. In ACM Multimedia. ACM, 2735–2743.
  38. TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio. In ACM Multimedia. ACM, 2391–2399.
  39. Multimodal Data Enhanced Representation Learning for Knowledge Graphs. In IJCNN. IEEE, 1–8.
  40. W. John Wilbur and Karl Sirotkin. 1992. The automatic identification of stop words. J. Inf. Sci. 18, 1 (1992), 45–55.
  41. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. CoRR abs/1609.08144 (2016).
  42. Image-embodied Knowledge Representation Learning. In IJCAI. ijcai.org, 3140–3146.
  43. Relation-enhanced Negative Sampling for Multimodal Knowledge Graph Completion. In ACM Multimedia. ACM, 3857–3866.
  44. Multimodal Biological Knowledge Graph Completion via Triple Co-Attention Mechanism. In ICDE. IEEE, 3928–3941.
  45. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In ICLR (Poster).
  46. KG-BERT: BERT for Knowledge Graph Completion. CoRR abs/1909.03193 (2019).
  47. Modality-Aware Negative Sampling for Multi-modal Knowledge Graph Embedding. In IJCNN. IEEE, 1–8.
  48. NativE: Multi-modal Knowledge Graph Completion in the Wild. Authorea Preprints (2024).
  49. Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion. CoRR abs/2402.15444 (2024).
  50. MACO: A Modality Adversarial and Contrastive Framework for Modality-Missing Multi-modal Knowledge Graph Completion. In NLPCC (1) (Lecture Notes in Computer Science, Vol. 14302). Springer, 123–134.
  51. Yichi Zhang and Wen Zhang. 2022. Knowledge Graph Completion with Pre-trained Multimodal Transformer and Twins Negative Sampling. CoRR abs/2209.07084 (2022).
  52. MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion. In EMNLP. Association for Computational Linguistics, 10527–10536.
  53. Knowledge Perceived Multi-modal Pretraining in E-commerce. In ACM Multimedia. ACM, 2744–2752.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yichi Zhang (185 papers)
  2. Zhuo Chen (319 papers)
  3. Lingbing Guo (27 papers)
  4. Yajing Xu (17 papers)
  5. Binbin Hu (42 papers)
  6. Ziqi Liu (78 papers)
  7. Huajun Chen (199 papers)
  8. Wen Zhang (170 papers)

Summary

  • The paper introduces an innovative approach that integrates tokenization, fusion, and augmentation to build detailed multi-modal entity representations.
  • It systematically combines diverse data modalities to enhance the granularity and interpretability of entity representations.
  • Experimental evaluations demonstrate improved performance across benchmarks, underscoring significant implications for multi-modal applications.

Comprehensive Overview of ACM's acmart LaTeX Document Class and Its Features

Introduction and Purpose of ACM's Template

The ACM consolidated article template, utilizing the acmart document class, serves as a unified LaTeX style for various ACM publications. It integrates essential features such as accessibility and metadata-extraction functions, crucial for future expansions of the ACM Digital Library. This document class supports different stages of publication across numerous ACM platforms, simplifying the publication process for both new and seasoned authors within the ACM community.

Utilization across Publications

The acmart document class is adaptable for an array of documentation types, from dual-anonymous initial submissions to camera-ready journal articles. This versatility is achieved through specific template styles and parameters:

  • Journal Styles: Different ACM journals utilize various styles such as acmsmall, acmlarge, and acmtog, each catering to the particular needs of the journal's focus and format.
  • Conference Proceedings: The majority utilize the acmconf style, with specific adaptations available for SIG-specific conferences like sigchi for SIGCHI articles, or sigplan for SIGPLAN conferences.

The choice of template style dictates the formatting nuances of the publication, ensuring consistency and adherence to ACM's publication standards.

Key Features and Parameters

The template supports numerous parameters that further refine the publication output to meet specific requirements, such as anonymous and review for double-blind review processes, or screen for color hyperlinks. Detailed guidance on these parameters enables authors to enhance the accessibility and functionality of their documents effectively.

Adhering to Formatting Standards

The introduction of the acmart class necessitates strict adherence to formatting standards. Modifications to template elements like margins, typefaces, or line spacing are generally prohibited to maintain a uniform appearance across publications. Violations of these standards require document revision, emphasizing the importance of following the preset guidelines closely.

Typeface and Presentation Requirements

Use of the “Libertine” typeface family is mandated, providing a standardized visual aesthetic for ACM publications. Also, attention to the proper capitalization and presentation of titles and subtitles is critical, ensuring clarity and professionalism in the document's appearance.

Author and Affiliation Documentation

Accurate metadata identification is imperative, necessitating detailed documentation of each author and their affiliations. This structured approach aids in the proper indexing and accessibility of the paper within the ACM ecosystem.

Rights and Licensing

Authors must manage rights information meticulously, including integrating specific LaTeX commands provided by ACM post-rights form completion. This ensures legal compliance and proper attribution of the published work.

Taxonomic Tools and Classification

Authors are encouraged to use the ACM Computing Classification System for better taxonomy and indexing of their work. Additionally, user-defined keywords offer a flexible tool for describing the research in more accessible terms to enhance discoverability and relevance in digital searches.

Conclusion: Implications and Future Adaptations

The acmart LaTeX document class standardizes ACM's publication process, ensuring a consistent and professional presentation of scholarly work. It accommodates the evolving needs of digital publishing with an emphasis on accessibility and metadata completeness. Future enhancements may likely focus on expanding this framework to include advanced digital publishing tools and greater customization options, potentially increasing the engagement and reach of ACM publications.

The meticulous structuring and detailed parameterization offered by acmart stand as testaments to ACM's commitment to high standards in the dissemination of scientific knowledge, ensuring that the system meets the dynamic needs of its diverse academic audience.