Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ontology-Driven KG Generation

Updated 2 April 2026
  • Ontology-driven knowledge graph generation is a method that uses formal ontologies as schemas to guide the extraction, validation, and integration of heterogeneous data sources.
  • The pipeline involves stages like ontology authoring, schema-constrained triple extraction, and logical consistency checks to ensure robust instantiation of KGs.
  • Practical applications include retrieval-augmented generation, multi-hop reasoning, and semantic search, leading to enhanced data consistency and query performance.

Ontology-driven knowledge graph (KG) generation refers to the systematic construction of KGs where formal ontologies are used as explicit schemas—defining classes, relations, constraints, and hierarchies—to steer the extraction, completion, and validation of factual assertions from diverse data sources (text, databases, documents). Unlike ad-hoc graph-building, ontology-driven pipelines leverage ontologies to ensure semantic consistency, queryability, explainability, and interoperability for downstream tasks such as retrieval-augmented generation (RAG), reasoning, or analytics.

1. Fundamental Principles and Canonical Architectures

Ontology-driven KG pipelines comprise several phases: ontology authoring or extraction, schema-grounded information extraction, knowledge graph instantiation, and consistency enforcement. The ontology functions as a TBox (terminology or schema), dictating permissible concepts, properties, domains/ranges, and entity types, while the ABox (assertional content) is populated with instance triples that are obligatorily compliant with these constraints (Nivas et al., 24 Dec 2025, Nayyeri et al., 2 Jun 2025, Qiang, 22 Mar 2026).

Typical architectures apply ontological guidance at multiple points:

Illustrative pipelines such as OntoRAG, RIGOR, and Wikontic exemplify these principles in domains such as technical documentation, relational databases, and open-domain question answering (Tiwari et al., 31 May 2025, Nayyeri et al., 2 Jun 2025, Chepurova et al., 29 Nov 2025).

2. Ontology Construction Methods

Ontology authoring in ontology-driven KG generation can be manual, semi-automated, or fully automated. Manual ontology design remains an option in tightly regulated domains but lacks scalability. Recent work focuses on LLM-supported or LLM-automated ontology extraction:

a) Competency question–driven design: The schema is constructed by eliciting competency questions (CQs) representing information needs. The pipeline translates CQs into classes and properties, possibly aligning or mapping to external standards (e.g., Wikidata properties via embedding matching) (Feng et al., 2024, Kommineni et al., 2024, Schimmenti et al., 13 Nov 2025).

b) Extraction from structured and unstructured data:

c) Ontology reshaping and user-in-the-loop: Existing knowledge-oriented ontologies may be transformed or pruned (e.g., via ontology reshaping) to optimize data coverage and usability in a target KG schema, under constraints of coverage, preservation, and simplicity (Zhou et al., 2022).

3. Ontology-Guided Knowledge Graph Population

Given a schema, KG population proceeds by schema-constrained extraction of (subject, predicate, object) triples from data, ensuring:

  • Subjects and objects are cast to ontology classes;
  • Predicates are restricted to ontology-declared relations;
  • Argument types and order obey property domain and range axioms.

Prompts supply the ontology as bullet lists or JSON/Turtle fragments, with explicit instructions that only schema-permissible relations/types must be output (Mihindukulasooriya et al., 2023, Feng et al., 2024). For relational data, each record is mapped to individuals and triples per table–to–class and column–to–property mapping (Nayyeri et al., 2 Jun 2025). For unstructured text or technical documents, LLM-based extraction is combined with chunking, entity normalization, and relation reconciliation (Tiwari et al., 31 May 2025, Abolhasani et al., 2024).

A typical KG instantiation step for a TBox O=(C,P)\mathcal{O}=(C,P) and document DD applies:

  1. Entity extraction: Label and type phrases by O\mathcal{O}’s classes.
  2. Triple construction: For each property pPp\in P with domain/range (Ci,Cj)(C_i,C_j), seek matching entity pairs within DD with types Ci,CjC_i,C_j and output (ei,p,ej)(e_i,p,e_j) (Feng et al., 2024, Nayyeri et al., 2 Jun 2025).

4. Compliance Algorithms and Quality Enforcement

Ontology-compliant KGs are realized by mapping terms, pruning invalid assertions, and optimizing for internal/external schema adherence (Qiang, 22 Mar 2026). Compliance is operationalized by:

  • Term matching: Lexical, edit-distance, synonym, embedding, or topology-based mapping between KG entities/relations and ontology terms, with confidence scoring and iterative refinement (Qiang, 22 Mar 2026).
  • Pattern-based compliance: Mining of frequent ontology fragments (“patterns”) and substitution or alignment using best-matching structures from alternative ontologies (Qiang, 22 Mar 2026).
  • Refinement and pruning: Removal of low-confidence, structurally inconsistent, or axiom-violating triples; canonicalization of synonyms; enforcement of mandatory argument-type and relation constraints (Elnagar et al., 2022, Park et al., 9 Dec 2025).

Metrics for internal compliance (coverage CaC_a, ontology utilization CtC_t, matching rate DD0) and combined compliance (e.g., DD1) quantify the faithfulness of the KG to its schema (Qiang, 22 Mar 2026). Logical consistency checks (e.g., no violation of disjointness axioms, proper type assignments) leverage OWL reasoners or SPARQL scripts.

5. Semantic Integration, Interoperability, and Optimization

Ontology-driven KGs enable:

  • Schema alignment and federation: KGs adhering to common or mapped ontologies support merging, cross-lingual integration, or external compliance by aligning to federated ontologies using joint embedding or mapping techniques (Saba, 2023, Qiang, 22 Mar 2026).
  • Operational efficiency: Modeling optimizations, such as relocation of frequently repeated literals or elimination of unnecessary blank nodes, can yield significant storage and query gains without semantic loss; e.g., up to 38.8% triple-count reduction in a unified ODA ontology for HPC telemetry (Khan et al., 8 Jul 2025).
  • Explainability and modular scaling: Modular ontologies and pattern-based schemas facilitate transparent query answering, cross-domain transfer, and iterative extension (Norouzi et al., 2024).

The resulting KGs are deployed in triplestores (for SPARQL), graph-databases (Neo4j), or embedded-graph retrievers (FAISS/Pinecone) for downstream semantic search, RAG, or analytics (Tiwari et al., 31 May 2025, Park et al., 9 Dec 2025).

6. Evaluation Methodologies and Empirical Benchmarks

Evaluation of ontology-driven KG generation spans:

Performance bottlenecks include ontology alignment (especially for text-extracted ontologies), prompt sensitivity, and handling of highly complex or domain-specific hierarchies (Cruz et al., 8 Nov 2025, Abolhasani et al., 2024, Oyewale et al., 1 Feb 2026). Manual or judge-LLM–mediated interventions remain important for assessing semantic correctness, resolve ambiguities, and optimize prompt schemas (Kommineni et al., 2024, Feng et al., 2024).

7. Future Directions and Open Challenges

Research is active on:

Open issues remain in large-ontology scaling, context-window–limited extraction, prompt sensitivity, and cross-domain generalization. Ongoing work addresses these through modularization, incremental building, and leveraging pattern libraries and reference schema repositories.


Key references:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ontology-Driven Knowledge Graph Generation.