KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment (2502.06472v1)

Published 10 Feb 2025 in cs.CL, cs.AI, cs.CE, and cs.DL

Abstract: Maintaining comprehensive and up-to-date knowledge graphs (KGs) is critical for modern AI systems, but manual curation struggles to scale with the rapid growth of scientific literature. This paper presents KARMA, a novel framework employing multi-agent LLMs to automate KG enrichment through structured analysis of unstructured text. Our approach employs nine collaborative agents, spanning entity discovery, relation extraction, schema alignment, and conflict resolution that iteratively parse documents, verify extracted knowledge, and integrate it into existing graph structures while adhering to domain-specific schema. Experiments on 1,200 PubMed articles from three different domains demonstrate the effectiveness of KARMA in knowledge graph enrichment, with the identification of up to 38,230 new entities while achieving 83.1\% LLM-verified correctness and reducing conflict edges by 18.6\% through multi-layer assessments.

Summary

The paper presents a novel multi-agent framework that decomposes knowledge graph enrichment into specialized sub-tasks, significantly boosting extraction accuracy and scalability.
Experimental results on 1,200 PubMed articles reveal that KARMA outperforms single-agent methods, with DeepSeek-v3 achieving notable entity coverage gains in the Genomics domain.
Ablation studies emphasize the critical roles of the summarizer, conflict resolution, and evaluator agents in maintaining consistency, precision, and overall KG quality.

The paper presents KARMA, a novel multi-agent framework leveraging LLMs for automated knowledge graph (KG) enrichment from unstructured scientific text. The core problem addressed is the difficulty of keeping KGs comprehensive and up-to-date given the explosion of scientific literature, which traditional manual or rule-based methods cannot scale to handle, especially in specialized domains like biomedicine.

KARMA tackles this challenge by decomposing the complex task of KG enrichment into a series of smaller, manageable sub-tasks, each handled by a specialized LLM-based agent. This multi-agent architecture, orchestrated by a Central Controller Agent (CCA), allows for collaboration, cross-agent verification, and iterative refinement, aiming to improve accuracy and robustness compared to monolithic single-agent approaches.

The framework comprises nine distinct agents:

Ingestion Agents (IA): Retrieve, normalize, and extract metadata from raw documents (e.g., PDFs from PubMed).
Reader Agents (RA): Parse normalized text into relevant segments and assign relevance scores based on domain knowledge and structural cues.
Summarizer Agents (SA): Condense relevant segments into concise summaries while preserving key entities and relationships, reducing downstream processing load.
Entity Extraction Agents (EEA): Identify domain-specific entities from summaries and normalize them to canonical forms using ontology-guided mapping.
Relationship Extraction Agents (REA): Infer relationships between identified and normalized entities based on the text, allowing for multi-label predictions.
Schema Alignment Agents (SAA): Map newly extracted entities or relations to the existing KG schema types or flag them for potential ontology expansion.
Conflict Resolution Agents (CRA): Detect potential contradictions between new triplets and existing KG knowledge and resolve them, potentially using an LLM-based debate mechanism.
Evaluator Agents (EA): Aggregate various verification signals (confidence, clarity, relevance) from previous stages to compute final scores for each candidate triplet and decide on its integration into the KG.
Central Controller Agent (CCA): Orchestrates the workflow, prioritizes tasks (using an LLM-based utility function inspired by multi-armed bandits), and allocates resources across the other agents.

The paper highlights three key innovations: the multi-agent architecture enabling cross-agent verification, domain-adaptive prompting strategies for specialized contexts, and a modular design for extensibility.

The experimental evaluation is a proof-of-concept conducted on 1,200 PubMed articles across three biomedical domains: Genomics, Proteomics, and Metabolomics. The paper compares KARMA using different LLM backbones (GLM-4 (GLM et al., 18 Jun 2024), GPT-4o (OpenAI et al., 2023), DeepSeek-v3 (DeepSeek-AI et al., 27 Dec 2024)) against a single-agent baseline. Evaluation employs a multi-faceted approach, including LLM-based correctness scores ( $R_{LC}$ ), graph statistics ( $\Delta_{Cov}$ , $\Delta_{Con}$ ), quality indicators ( $R_{CR}$ , $C_{QA}$ ), and core metrics ( $M_{Con}$ , $M_{Cla}$ , $M_{Rel}$ ).

Key results demonstrate KARMA's effectiveness:

It significantly outperforms a single-agent LLM approach in knowledge extraction quality and quantity.
Performance varies across domains, with higher coverage gains observed in the more prevalent Genomics domain ($38,230$ new entities with DeepSeek-v3).
The choice of LLM backbone impacts performance, with DeepSeek-v3 generally leading in scale and coverage, while GPT-4o shows strengths in precision, and GLM-4 exhibits domain-specific capabilities.
The ablation paper confirms the crucial role of individual agents, particularly the Summarizer (reducing noise), Conflict Resolution (maintaining consistency), and Evaluator (ensuring quality) agents, for achieving high correctness and coherence.

Computational cost analysis shows variations in token usage and processing time across domains, correlated with article complexity and information density.

Limitations discussed include the primary reliance on LLM-based evaluation metrics instead of extensive human expert validation, performance variations in domains with sparse relationships (like Metabolomics), and potential biases inherited from the underlying LLMs. The ethical impact emphasizes the need for human oversight to mitigate bias and ensure accuracy in critical applications like healthcare.

In summary, KARMA presents a practical framework for automating large-scale KG enrichment from text using a collaborative multi-agent LLM system, offering improved scalability, robustness, and accuracy compared to prior methods by systematically decomposing and verifying the extraction process.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Find Related Papers

Authors (2)

Tweets

https://twitter.com/DigitalLibs/status/1889206107828011162

https://twitter.com/pagilgukey/status/1895410906588434757