Papers
Topics
Authors
Recent
Search
2000 character limit reached

Wikontic: Ontology-Aligned KG Pipeline

Updated 3 December 2025
  • Wikontic is a multi-stage pipeline that constructs compact, ontology-aligned knowledge graphs from open-domain text.
  • It extracts candidate triplets with qualifiers and enforces Wikidata-based type and relation constraints to ensure high logical consistency.
  • Wikontic achieves state-of-the-art performance on benchmarks like MuSiQue, HotpotQA, and MINE-1 while using significantly fewer output tokens.

Wikontic is a multi-stage pipeline designed to construct Wikidata-aligned, ontology-aware knowledge graphs (KGs) from open-domain text using LLMs. Its primary innovation centers on generating compact, ontology-consistent, and well-connected KGs by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. Wikontic’s approach enables the creation of KGs that achieve state-of-the-art information retention and answer accuracy, while requiring significantly fewer output tokens than previous methods. The system presents a scalable solution for leveraging structured knowledge in LLMs, advancing beyond the retrieval-augmented paradigm in efficiency and knowledge grounding (Chepurova et al., 29 Nov 2025).

1. Motivation and Context

Wikontic addresses fundamental limitations in the deployment of KGs for LLMs: existing systems mostly use KGs as auxiliary retrieval structures, with little scrutiny of their intrinsic quality or compactness. The design goal is to enable KGs to act as primary, verifiable sources of information, directly grounding LLM inference and Q&A, especially in open-domain settings. Emphasis is placed on adherence to established ontologies—specifically Wikidata—and on information efficiency, both for storage and downstream use. The broader research context includes methods such as retrieval-augmented generation and recent KG construction pipelines like AriGraph and GraphRAG, which Wikontic aims to surpass in both quality and efficiency (Chepurova et al., 29 Nov 2025).

2. Pipeline Design and Methodological Stages

The Wikontic pipeline is structured as a sequence of transformation and refinement phases:

  1. Triplet Extraction with Qualifiers: Open-domain texts are processed by LLMs to extract candidate subject–predicate–object triplets augmented with qualifiers, capturing both primary relational structure and relevant context.
  2. Wikidata-Aligned Constraint Enforcement: Type and relation constraints derived from Wikidata are systematically applied. Only triplets and qualifiers that satisfy these ontological requirements are retained, ensuring maximal alignment to Wikidata’s logical and semantic schema.
  3. Entity Normalization and Deduplication: Entities are canonicalized to resolve synonyms and textual variations, suppressing duplication and favoring compact graph structures.

The result is an ontology-consistent, well-connected KG suitable for direct consumption by LLMs and downstream evaluation frameworks. A plausible implication is that this method may generalize to other ontology-driven KG construction pipelines.

3. Ontology Consistency and Qualifier Usage

A central feature of Wikontic is strict conformity with Wikidata’s ontology. Enforcing type and relation constraints reduces off-schema triplets and ensures high logical validity of the resulting KG. The incorporation of qualifiers extends expressivity, facilitating representation of contextual or temporally qualified facts, in line with Wikidata’s data model. This approach is distinct from earlier methods that emphasize broad coverage at the expense of ontological precision.

A notable result is that, on the MuSiQue dataset, the correct answer entity appears in 96% of generated Wikontic triplets, suggesting high recall within the supported schema and significant practical coverage for open-domain QA tasks (Chepurova et al., 29 Nov 2025).

4. Evaluation and Empirical Performance

Wikontic is evaluated using several established benchmarks:

  • MuSiQue: The pipeline achieves 59.8 F1, with answer entity recall of 96% across generated triplets.
  • HotpotQA: A triplets-only configuration—i.e., excluding textual context—yields 76.0 F1, a competitive result.
  • MINE-1: Attains 86% information retention, surpassing prior KG construction approaches.

These metrics indicate that Wikontic enables knowledge grounding comparable to, or better than, retrieval-augmented generation methods that require the full textual context, instead relying entirely on a high-coverage, ontology-consistent, structured KG.

5. Efficiency, Scalability, and Baseline Comparison

Wikontic offers substantial efficiency improvements in build-time token usage. Entire KG construction operates with fewer than 1,000 output tokens, approximately 3×3\times fewer than AriGraph and less than $1/20$ the output tokens required by GraphRAG. This reduction provides both practical speedups and lower storage requirements, which is crucial for scaling to large corpora or dynamic information environments (Chepurova et al., 29 Nov 2025).

Baseline comparisons indicate that Wikontic matches or outperforms retrieval-augmented frameworks on core extractive QA benchmarks, despite dispensing with retrieval at inference time.

6. Significance and Research Implications

By operationalizing rigorous ontology alignment and highly efficient extraction, Wikontic establishes a new state-of-the-art in LLM-driven KG construction. Its approach demonstrates that high-fidelity, compact structured knowledge can replace or augment document-based retrieval, with implications for future architectures in verifiable machine reasoning, open-domain question answering, and knowledge-based inference. A plausible implication is the emergence of hybrid systems that alternate between purely KG-driven and retrieval-augmented pipelines, depending on task requirements.

7. Limitations and Future Directions

The Wikontic abstract limits discussion of remaining challenges or open questions. However, one plausible concern is coverage beyond Wikidata’s schema or in highly specialized domains. Further work may extend the pipeline to alternative ontologies, explore robustness to noisy input text, or directly investigate procedures for dynamic KG updating. The efficiency gains suggest applicability at web scale, but the interplay between KG compactness and downstream LLM performance remains a topic for continued examination.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wikontic.