Papers
Topics
Authors
Recent
2000 character limit reached

Wikontic: Ontology-Aware KG Pipeline

Updated 1 January 2026
  • Wikontic is an ontology-aware pipeline that builds well-connected knowledge graphs from open-domain text using context-rich triplet extraction and strict Wikidata constraints.
  • It systematically validates candidate triplets against Wikidata’s schema, ensuring accurate type and relation alignment while aggressively normalizing entities to reduce duplication.
  • Empirical evaluations demonstrate superior performance with high answer coverage, improved F1 scores, and significant token efficiency compared to previous KG construction methods.

Wikontic is a multi-stage pipeline for constructing knowledge graphs (KGs) from open-domain text. It emphasizes ontology awareness and strict alignment with Wikidata schema constraints, aiming to produce compact, well-connected, and verifiable knowledge representations suitable for structured grounding in LLMs. Departing from conventional LLM-KG integration pipelines, which frequently relegate KGs to auxiliary retrieval roles, Wikontic systematically enforces type and relation constraints, organizes extracted triplets with contextually relevant qualifiers, and performs aggressive normalization to minimize entity duplication. Empirical results demonstrate that Wikontic produces superior KGs with high answer-entity coverage, strong benchmark performance, and notable efficiency gains over prior KG construction methods (Chepurova et al., 29 Nov 2025).

1. Motivation and Context

KGs provide structured, verifiable foundations for LLMs, addressing the limitations of unstructured text grounding such as inconsistency, redundancy, and poor entity disambiguation. Despite the proliferation of retrieval-augmented generation workflows, previous LLM-based systems largely utilized KGs as auxiliary tools without explicit focus on the intrinsic quality, compactness, and ontological fidelity of generated graphs. Wikontic targets this gap by introducing a pipeline explicitly designed to maximize ontology consistency and connectivity, informed by Wikidata’s type and relation schema. A plausible implication is an increased potential for downstream explainability and factual reliability in LLM outputs, given the heightened quality of their structured grounding.

2. Multi-Stage Pipeline Overview

Wikontic's construction process comprises several sequential stages:

  1. Extraction of Candidate Triplets with Qualifiers: The system parses open-domain text to generate candidate KG triplets, each enriched with qualifiers that capture context-specific details.
  2. Wikidata-Based Type and Relation Constraints: Extracted triplets are filtered according to Wikidata’s entity and relation schemas, enforcing both type correctness and relational validity.
  3. Entity Normalization: A normalization routine merges duplicate representations, streamlining the graph structure and enhancing connectivity.

This staged approach yields KGs that are compact and consistently aligned with an explicit ontology, supporting high-quality automated reasoning.

3. Ontology Consistency and Entity Normalization

By enforcing Wikidata-based constraints on both entity types and admissible relations, Wikontic’s pipeline ensures ontology compliance throughout graph construction. Entity normalization reduces duplication, enhancing the connectivity and compactness of the resulting KG. This normalization process is critical for downstream utility, as redundant or fragmented entity representations can impede graph traversal, reasoning, and effective grounding in LLMs. The resulting KGs show marked improvements in being ontology-consistent and well-connected, validating the efficacy of the normalization and constraint mechanisms.

4. Empirical Evaluation and Benchmarking

Wikontic’s pipeline was evaluated on multiple QA and information retention benchmarks:

  • MuSiQue: The correct answer entity appeared in 96% of generated triplets, demonstrating high answer coverage.
  • HotpotQA: The triplets-only setup achieved 76.0 F1.
  • MuSiQue (F1): Wikontic yielded 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that require additional textual context.
  • MINE-1 (Information Retention): Attained state-of-the-art performance with 86%, outperforming all prior KG construction methods.

These results substantiate Wikontic’s competitive capability, even when direct text retrieval is omitted, and demonstrate its state-of-the-art performance in retention and coverage among structured extraction methods (Chepurova et al., 29 Nov 2025).

5. Efficiency and Scalability

Wikontic achieves high KG construction efficiency at build time, requiring less than 1,000 output tokens for a typical graph. This is approximately 3×3\times fewer tokens than AriGraph and less than $1/20$ the output tokens of GraphRAG. Such efficiency presents a scalable solution for KG construction in LLM-augmented workflows, enabling practical deployment for large-scale, multi-domain applications without prohibitive computational overhead.

6. Significance for LLMs and Future Directions

The Wikontic framework demonstrates that strictly enforced ontology alignment, robust entity normalization, and qualifier-rich triplet extraction enhance the suitability of KGs for structured grounding in LLMs. This suggests that future research may prioritize not only the integration of KGs in LLM-based systems but also their intrinsic quality, compactness, and schema alignment. Wikontic offers a scalable and empirically validated blueprint for such KG construction approaches, highlighting the utility of explicit schema constraints and information-centric evaluation in advancing the intersection of knowledge representation and language modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Wikontic Framework.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube