Wikontic: Ontology-Aware KG Pipeline
- Wikontic is an ontology-aware pipeline that builds well-connected knowledge graphs from open-domain text using context-rich triplet extraction and strict Wikidata constraints.
- It systematically validates candidate triplets against Wikidata’s schema, ensuring accurate type and relation alignment while aggressively normalizing entities to reduce duplication.
- Empirical evaluations demonstrate superior performance with high answer coverage, improved F1 scores, and significant token efficiency compared to previous KG construction methods.
Wikontic is a multi-stage pipeline for constructing knowledge graphs (KGs) from open-domain text. It emphasizes ontology awareness and strict alignment with Wikidata schema constraints, aiming to produce compact, well-connected, and verifiable knowledge representations suitable for structured grounding in LLMs. Departing from conventional LLM-KG integration pipelines, which frequently relegate KGs to auxiliary retrieval roles, Wikontic systematically enforces type and relation constraints, organizes extracted triplets with contextually relevant qualifiers, and performs aggressive normalization to minimize entity duplication. Empirical results demonstrate that Wikontic produces superior KGs with high answer-entity coverage, strong benchmark performance, and notable efficiency gains over prior KG construction methods (Chepurova et al., 29 Nov 2025).
1. Motivation and Context
KGs provide structured, verifiable foundations for LLMs, addressing the limitations of unstructured text grounding such as inconsistency, redundancy, and poor entity disambiguation. Despite the proliferation of retrieval-augmented generation workflows, previous LLM-based systems largely utilized KGs as auxiliary tools without explicit focus on the intrinsic quality, compactness, and ontological fidelity of generated graphs. Wikontic targets this gap by introducing a pipeline explicitly designed to maximize ontology consistency and connectivity, informed by Wikidata’s type and relation schema. A plausible implication is an increased potential for downstream explainability and factual reliability in LLM outputs, given the heightened quality of their structured grounding.
2. Multi-Stage Pipeline Overview
Wikontic's construction process comprises several sequential stages:
- Extraction of Candidate Triplets with Qualifiers: The system parses open-domain text to generate candidate KG triplets, each enriched with qualifiers that capture context-specific details.
- Wikidata-Based Type and Relation Constraints: Extracted triplets are filtered according to Wikidata’s entity and relation schemas, enforcing both type correctness and relational validity.
- Entity Normalization: A normalization routine merges duplicate representations, streamlining the graph structure and enhancing connectivity.
This staged approach yields KGs that are compact and consistently aligned with an explicit ontology, supporting high-quality automated reasoning.
3. Ontology Consistency and Entity Normalization
By enforcing Wikidata-based constraints on both entity types and admissible relations, Wikontic’s pipeline ensures ontology compliance throughout graph construction. Entity normalization reduces duplication, enhancing the connectivity and compactness of the resulting KG. This normalization process is critical for downstream utility, as redundant or fragmented entity representations can impede graph traversal, reasoning, and effective grounding in LLMs. The resulting KGs show marked improvements in being ontology-consistent and well-connected, validating the efficacy of the normalization and constraint mechanisms.
4. Empirical Evaluation and Benchmarking
Wikontic’s pipeline was evaluated on multiple QA and information retention benchmarks:
- MuSiQue: The correct answer entity appeared in 96% of generated triplets, demonstrating high answer coverage.
- HotpotQA: The triplets-only setup achieved 76.0 F1.
- MuSiQue (F1): Wikontic yielded 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that require additional textual context.
- MINE-1 (Information Retention): Attained state-of-the-art performance with 86%, outperforming all prior KG construction methods.
These results substantiate Wikontic’s competitive capability, even when direct text retrieval is omitted, and demonstrate its state-of-the-art performance in retention and coverage among structured extraction methods (Chepurova et al., 29 Nov 2025).
5. Efficiency and Scalability
Wikontic achieves high KG construction efficiency at build time, requiring less than 1,000 output tokens for a typical graph. This is approximately fewer tokens than AriGraph and less than $1/20$ the output tokens of GraphRAG. Such efficiency presents a scalable solution for KG construction in LLM-augmented workflows, enabling practical deployment for large-scale, multi-domain applications without prohibitive computational overhead.
6. Significance for LLMs and Future Directions
The Wikontic framework demonstrates that strictly enforced ontology alignment, robust entity normalization, and qualifier-rich triplet extraction enhance the suitability of KGs for structured grounding in LLMs. This suggests that future research may prioritize not only the integration of KGs in LLM-based systems but also their intrinsic quality, compactness, and schema alignment. Wikontic offers a scalable and empirically validated blueprint for such KG construction approaches, highlighting the utility of explicit schema constraints and information-centric evaluation in advancing the intersection of knowledge representation and language modeling.