DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains (2506.00708v1)

Published 31 May 2025 in cs.AI, cs.CL, and cs.LG

Abstract: Knowledge graph completion (KGC) aims to predict missing triples in knowledge graphs (KGs) by leveraging existing triples and textual information. Recently, generative LLMs have been increasingly employed for graph tasks. However, current approaches typically encode graph context in textual form, which fails to fully exploit the potential of LLMs for perceiving and reasoning about graph structures. To address this limitation, we propose DrKGC (Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion). DrKGC employs a flexible lightweight model training strategy to learn structural embeddings and logical rules within the KG. It then leverages a novel bottom-up graph retrieval method to extract a subgraph for each query guided by the learned rules. Finally, a graph convolutional network (GCN) adapter uses the retrieved subgraph to enhance the structural embeddings, which are then integrated into the prompt for effective LLM fine-tuning. Experimental results on two general domain benchmark datasets and two biomedical datasets demonstrate the superior performance of DrKGC. Furthermore, a realistic case study in the biomedical domain highlights its interpretability and practical utility.

PDF Abstract

Summary of DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains

The paper presents DrKGC, a novel framework designed to enhance Knowledge Graph Completion (KGC) by integrating LLMs with structured knowledge from knowledge graphs (KGs) in both general and biomedical domains. DrKGC introduces a dynamic and flexible approach utilizing a subgraph retrieval mechanism to overcome the shortcomings of existing methodologies in effectively exploiting the structural information inherent in KGs.

Core Contributions

The authors develop DrKGC with several innovative components to address limitations related to the loss of structural information, static representations of embeddings, and non-specific predictions from LLMs. DrKGC utilizes a sequence of processes to improve completion tasks:

Flexible Lightweight Model Training: DrKGC employs lightweight models to learn structural embeddings, enhancing the ability to understand KG structures better.
Dynamic Subgraph Retrieval: By leveraging a bottom-up retrieval strategy, DrKGC extracts pertinent subgraphs informed by logical rules, enabling localized focus on entity relationships.
Graph Convolutional Network (GCN) Adapter: A GCN is applied to subgraphs to refine embeddings further, integrating them with LLM prompts for fine-tuning while retaining structural insights.
Integration with LLMs: Structural and logical insights are injected into LLM prompts, improving the model’s capability to produce relevant and contextually grounded responses, notably in handling complex entities and relationships in biomedical data.

Numerical Results and Implications

The experiments conducted involve two general KGs (WN18RR and FB15k-237) and two specialized biomedical KGs (PharmKG and PrimeKG), demonstrating the effectiveness of DrKGC across domains. The results indicate superior performance metrics such as MRR and Hits@K compared to various state-of-the-art baseline models, including structure-based methods (TransE, CompGCN), rule-based approaches (NCRL), and other LLM-integrated methods (COSIGN, DIFT). This performance implies that DrKGC effectively alleviates the challenges of entity ambiguity and relation diversity, crucial for practical applications in drug repurposing and other areas of biomedical research.

Future Prospects

The capability of DrKGC to dynamically integrate KGC with LLMs signifies an evolution in how machine learning models interpret and complete graphs, especially in data-intensive or evolving fields such as biomedicine. By enhancing the interpretability where machine reasoning meets domain-specific knowledge, DrKGC potentially paves the way for more informed decision-making aids in complex medical and scientific inquiries. Future research may explore further optimization of retrieval strategies and embedding integration, potentially extending DrKGC’s application to broader graph tasks like question answering and reasoning, which require nuanced understanding.

In summary, this paper's contributions to KGC using subgraph retrieval and LLM augmentation represent a significant step forward in leveraging structure-oriented techniques to enhance the predictive power and contextual understanding of modern AI models.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Yongkang Xiao (7 papers)
Sinian Zhang (4 papers)
Yi Dai (20 papers)
Huixue Zhou (14 papers)
Jue Hou (34 papers)
Jie Ding (123 papers)
Rui Zhang (1138 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos