Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MEDMKG: Benchmarking Medical Knowledge Exploitation with Multimodal Knowledge Graph (2505.17214v1)

Published 22 May 2025 in cs.AI

Abstract: Medical deep learning models depend heavily on domain-specific knowledge to perform well on knowledge-intensive clinical tasks. Prior work has primarily leveraged unimodal knowledge graphs, such as the Unified Medical Language System (UMLS), to enhance model performance. However, integrating multimodal medical knowledge graphs remains largely underexplored, mainly due to the lack of resources linking imaging data with clinical concepts. To address this gap, we propose MEDMKG, a Medical Multimodal Knowledge Graph that unifies visual and textual medical information through a multi-stage construction pipeline. MEDMKG fuses the rich multimodal data from MIMIC-CXR with the structured clinical knowledge from UMLS, utilizing both rule-based tools and LLMs for accurate concept extraction and relationship modeling. To ensure graph quality and compactness, we introduce Neighbor-aware Filtering (NaF), a novel filtering algorithm tailored for multimodal knowledge graphs. We evaluate MEDMKG across three tasks under two experimental settings, benchmarking twenty-four baseline methods and four state-of-the-art vision-language backbones on six datasets. Results show that MEDMKG not only improves performance in downstream medical tasks but also offers a strong foundation for developing adaptive and robust strategies for multimodal knowledge integration in medical artificial intelligence.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaochen Wang (32 papers)
  2. Yuan Zhong (70 papers)
  3. Lingwei Zhang (7 papers)
  4. Lisong Dai (7 papers)
  5. Ting Wang (213 papers)
  6. Fenglong Ma (66 papers)

Summary

Benchmarking Medical Knowledge Exploitation with Multimodal Knowledge Graph

The paper "Benchmarking Medical Knowledge Exploitation with Multimodal Knowledge Graph" presents an innovative approach to integrating visual and textual medical data through a multimodal knowledge graph, henceforth referred to as M-KG. This integration aims to address the limitations of unimodal medical knowledge graphs when applied to tasks requiring multimodal input, such as medical visual question answering (VQA) and text-image retrieval. The M-KG leverages the extensive data from MIMIC-CXR and UMLS, utilizing LLMs alongside rule-based systems to effectively extract clinical concepts and their relationships, establishing a robust framework for medical AI applications.

Construction and Methodology

The development of M-KG involved a multi-stage process beginning with the extraction and integration of data from MIMIC-CXR and the structured framework of UMLS. This approach sets the groundwork for the sophisticated linking of medical imaging data with corresponding textual concepts, filling a significant gap in multimodal data resources. The methodology also incorporates a novel Neighbor-aware Filtering (NaF) algorithm, designed to refine the knowledge graph by prioritizing informative medical images through a scoring system based on the distinctiveness of their associated concepts and relations.

In terms of processing, the approach utilizes a hybrid of rule-based and LLM methodologies for concept identification and disambiguation. This dual strategy capitalizes on the structured nature of rule-based systems for concept extraction and the contextual reasoning capabilities of LLMs for precise alignment and classification within the graph. The M-KG thus achieved shows strong potential in enriching medical AI tasks requiring deep integration of heterogeneous data types.

Evaluation and Results

The paper rigorously benchmarks the M-KG across several tasks—link prediction, knowledge-augmented text-image retrieval, and medical VQA—demonstrating its potential in enhancing downstream medical AI performances. An extensive evaluation spans 24 baseline methods, affirming the M-KG's capacity to improve upon existing methodologies by providing enhanced contextual understanding and data richness, thus bridging the semantical incongruence commonly observed when dealing with multimodal datasets.

In link prediction tasks, models employing translation-based embeddings such as TransD and TransE yielded superior performance, affirming their utility in capturing complex relational data within multimodal contexts. Through knowledge-augmented text-image retrieval tasks, M-KG integrated with enhanced retrieval models like KnowledgeCLIP demonstrated significant performance gains, particularly in low-rank settings, emphasizing M-KG's role in enabling accurate retrieval in medical data contexts.

Similarly, in medical VQA tasks, M-KG driven frameworks such as MR-MKG outperformed alternatives, underscoring the efficacy of structured, knowledge-augmented learning mechanisms in achieving more informed reasoning and decision-making within AI models.

Implications and Future Directions

The introduction of M-KG heralds significant implications for medical AI, particularly in improving diagnostic tools and decision-support systems through enriched data symbiosis and relational understanding. The provision of a publicly accessible dataset repository further supports collaborative developments and research in this field, fostering innovation across varying medical AI applications.

Looking forward, there exists the potential for refining M-KG by incorporating additional data sources and modalities, as well as exploring adaptive learning frameworks that allow for dynamic interfacing with LLM-based pretraining and fine-tuning processes. This integration strategy promises to further enhance AI robustness and generalization capabilities across diverse healthcare-related scenarios. Furthermore, future work should investigate backbones that are agnostic yet sensitive to the inherent structures within multimodal knowledge constructs, facilitating a broader applicability and resilience within various clinical environments.

In summary, the paper offers substantial contributions to bridging multimodal data for improved medical AI, presenting promising avenues for enhanced and nuanced exploitation of medical knowledge through structured integration of diverse data sources.

Youtube Logo Streamline Icon: https://streamlinehq.com