- The paper introduces SHARP, a novel schema-aware agent that employs multi-step reasoning for robust triple verification in knowledge graphs.
- It utilizes a hybrid toolset combining KG APIs, external knowledge retrieval, and expert memory trajectories to achieve high accuracy and explainability.
- Experimental results show SHARP outperforms baseline methods with a 4.2% to 12.9% accuracy gain, delivering explicit and interpretable verification evidence.
Introduction and Motivation
This work introduces SHARP, a schema-hybrid autonomous agent designed for robust, interpretable triple verification in knowledge graphs (KGs). The motivation stems from the recognition that current methods—whether graph-embedding-based, parametric LLM-based, or RAG-style approaches—demonstrate marked limitations when faced with complex or long-tail facts. Most notably, typical approaches exhibit single-source bias and static (single-step) inference, and lack the capability to deliver transparent verification explanations. SHARP addresses these constraints via strategic, multi-step reasoning, integrating internal KG structure with external semantic evidence. The agent reformulates triple verification as an explicit planning and evidential reasoning problem, operationalized as a sequential interaction with heterogeneous knowledge environments.
Methodology
Agent Architecture
SHARP utilizes a training-free LLM-driven agent that interacts with three core knowledge sources: the knowledge graph G, external world knowledge W (including Wikipedia and web search), and an expert memory bank M composed of reasoning trajectory exemplars. The system decomposes verification into three distinct but interlocked modules:
- Schema-Aware Initialization: The agent retrieves high-quality, analogical reasoning trajectories from M using a semantic encoder, filtered for similarity to the input triple. These trajectories support the generation of a strategic, schema-aware verification plan.
- Iterative ReAct Reasoning: SHARP extends the classic ReAct framework by incorporating plan adherence and deviation correction. At each time step, the agent reasons about current evidence, selects actions (tool invocations), and updates its context based on new observations.
- Hybrid Knowledge Toolset: The agent dynamically invokes a suite of explicit tools: internal KG structure APIs (schema definition, neighbor retrieval, n-hop path search), Wikipedia entity/relationship retrieval, and open-domain web search. A hybrid scoring function balances sparse symbolic matching with dense semantic similarity for evidence selection and re-ranking.
Triple verification is cast as a contextual, sequential decision process under an agent policy πθ​, with the explicit objective of maximizing the joint likelihood of correct label prediction and self-consistent evidence chains, given the query triple and all available knowledge resources. The agent’s state is continuously updated as it retrieves, reasons, and observes, ensuring support for complex, multi-hop, and long-tail verifications.
Experimental Evaluation
Datasets and Negative Sampling
The empirical analysis utilizes FB15K-237 (focused on transductive, multi-hop reasoning) and Wikidata5M-Ind (covering large-scale, inductive verification with strong long-tail/entity diversity). Type-constrained negative sampling creates realistic, hard negative triples to avoid trivial distinctions.
Baseline Comparisons
SHARP is benchmarked against:
- Graph embedding models (TransE, DistMult, RotatE)
- PLM-based methods (KG-BERT, SimKGC)
- LLMs in zero-shot and CoT/self-consistency-enhanced settings (GPT-3.5-turbo, GPT-4o, Qwen3-max)
- Task-tailored agentic frameworks and RAG pipelines (KGValidator, etc.)
Main Results
SHARP achieves a statistically significant improvement over all baselines, with an accuracy gain of 4.2% on FB15K-237 and 12.9% on Wikidata5M-Ind compared to the strongest prior methods. Notably, it delivers a precision of 98.7% on Wikidata5M-Ind, balancing this with high recall and F1, and providing explicit, evidence-based justification for each decision.
Component Contributions and Ablations
Ablation studies rigorously demonstrate that each module—memory augmentation, schema-aware planning, KG and external evidence tools—is essential for optimal performance. Disabling schema-aware planning or external tool access degrades both accuracy and F1 by over 10% on both datasets. The integration of internal and external evidence is shown to be critical for complex and long-tail fact verification.
Cost and Efficiency
SHARP requires modest computational cost at inference time (average per-sample cost 0.6-1.1 cents USD), driven primarily by LLM context usage and tool invocation. Despite higher latencies and costs compared to classical embedding methods, SHARP’s training-free paradigm dramatically reduces deployment overhead and eliminates the need for model retraining or fine-tuning when graphs evolve.
Implications and Future Directions
Practically, SHARP’s architecture is highly applicable to high-stakes, knowledge-intensive domains (medicine, law, etc.) demanding both reliability and rich explanation. Theoretically, the work suggests a paradigm shift: verification tasks are best viewed as multi-source, agentic reasoning challenges rather than black-box classification problems.
The explicit synergy between KG structure and open-text evidence also points toward future research on more granular tool orchestration, human-in-the-loop adversarial data curation, and adaptive schema-driven retrieval across evolving knowledge contexts. SHARP’s framework is extensible to other structured verification domains, opening avenues for more robust, interpretable, and dynamic knowledge management strategies in large-scale AI systems.
Conclusion
SHARP advances knowledge graph triple verification by reframing it as active, schema-driven multi-source reasoning. The introduction of a hybrid toolset, memory-augmented schema planning, and real-time agentic interaction yields state-of-the-art empirical results, enhanced interpretability, and meaningful robustness to noisy or long-tail factual scenarios. The work establishes a rigorous, extensible foundation for principled, agent-based verification across structured and unstructured knowledge domains.