Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base (2312.10417v2)

Published 16 Dec 2023 in cs.AI

Abstract: Multimodal knowledge bases (MMKBs) provide cross-modal aligned knowledge crucial for multimodal tasks. However, the images in existing MMKBs are generally collected for entities in encyclopedia knowledge graphs. Therefore, detailed groundings of visual semantics with linguistic concepts are lacking, which are essential for the visual concept cognition ability of multimodal models. Addressing this gap, we introduce M2ConceptBase, the first concept-centric MMKB. M2ConceptBase models concepts as nodes with associated images and detailed textual descriptions. We propose a context-aware multimodal symbol grounding approach to align concept-image and concept-description pairs using context information from image-text datasets. Comprising 951K images and 152K concepts, M2ConceptBase links each concept to an average of 6.27 images and a single description, ensuring comprehensive visual and textual semantics. Human studies confirm more than 95% alignment accuracy, underscoring its quality. Additionally, our experiments demonstrate that M2ConceptBase significantly enhances VQA model performance on the OK-VQA task. M2ConceptBase also substantially improves the fine-grained concept understanding capabilities of multimodal LLMs through retrieval augmentation in two concept-related tasks, highlighting its value.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhiwei Zha (2 papers)
  2. Jiaan Wang (35 papers)
  3. Zhixu Li (43 papers)
  4. Xiangru Zhu (4 papers)
  5. Wei Song (129 papers)
  6. Yanghua Xiao (151 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets