Ontology Creation & Expansion

Updated 21 November 2025

Ontology creation and expansion are defined as the systematic development and augmentation of formal knowledge representations that capture conceptual domains and their relationships.
Techniques range from manual curation to LLM-assisted and embedding-based methods, enabling automated extraction and dynamic integration of new data.
Hybrid approaches combining pattern-based and neural methodologies improve scalability and accuracy, fostering semantic interoperability across diverse applications.

Ontology creation and expansion refer to the development, systematic augmentation, and continual maintenance of explicit, formalized representations of conceptual domains and the relationships among their constituent entities. Ontologies serve as the semantic backbone for information integration, knowledge management, and automated reasoning across diverse domains such as artificial intelligence, organizational knowledge, scene understanding in automated vehicles, conversational agents, and scientific data exchange. State-of-the-art approaches integrate manual knowledge engineering, automated extraction from natural language and structured sources, embedding-based pattern completion, and LLM–driven concept discovery.

1. Foundations and Definitions

Ontologies are formalized as pairs or triples— $O = (C, R, A)$ —where $C$ denotes the set of concept classes (T-Box), $R$ is the set of named relations (object properties), and $A$ is a set of axioms (constraints, logical relationships, e.g., class disjointness, domain/range restrictions) (Elnagar et al., 2022). Central to ontology engineering is the continual expansion of $O$ with new $C$ , $R$ , and $A$ to capture evolving conceptualizations and support open-world reasoning. In conversational AI, for instance, an ontology $O$ comprises distinct sets of user intents $I$ , slots $S$ , and permissible values $V$ , and ontology expansion (OnExp) is the task of dynamically incorporating novel items $(I_u, S_u, V_u)$ identified in ambient user data, thereby moving beyond the closed-world assumption of static ontologies (Liang et al., 19 Oct 2024).

Ontology expansion (or completion) is also formalized as the prediction, ranking, or validation of plausible axioms $r^* \notin R$ , where each candidate inclusion $C_i \sqsubseteq C_j$ is scored for plausibility and incorporated if consistent with domain constraints (Li et al., 25 Mar 2024).

2. Methodologies for Ontology Creation

Manual and Semi-Automated Curation

Traditionally, ontology creation is manual, involving domain experts authoring class hierarchies, relationships, definitions, synonyms, and references in structured tabular formats (e.g., Google Sheets) (Joachimiak et al., 3 Apr 2024). Quality is enforced via iterative rounds of curation, automated quality control using reasoners (e.g., ELK, HermiT, Pellet), and compliance checks (e.g., MIRO guidelines). Modularization is used to subdivide ontologies into interoperable domains for scalability and maintenance, as in the Space Situational Awareness Ontology (SSAO) (Rovetto et al., 2016).

Automated and LLM-Assisted Generation

Recent methodologies leverage LLMs for first-draft ontology generation. Two notable prompting techniques are Memoryless CQbyCQ (independent modeling of each competency question) and Ontogenia (metacognitive, iterative, pattern-driven expansion re-using ontology design patterns, ODPs). Both approaches translate user stories and competency questions (CQs) into OWL axioms but differ in context management and integration of prior knowledge (Lippolis et al., 7 Mar 2025). LLM-assisted proposal of new branches or refinement of class hierarchies is adopted in Artificial Intelligence Ontology (AIO) construction via prompt templates and few-shot examples, with subsequent human review and integration (Joachimiak et al., 3 Apr 2024).

Extraction from Unstructured Text and Knowledge Graphs

Automated frameworks transform unstructured corpora $T = \{t_1, \ldots, t_n\}$ into preliminary knowledge graphs (KGs) $G = (V, E, \sigma)$ , followed by refinement, anomaly exclusion, and mapping into target ontologies $O$ via consistency checks with reference ontologies, domain alignment, and axiom completion (Elnagar et al., 2022). Refinement applies techniques such as filtering by confidence thresholds, exclusion via disjointness axioms, embedding-based link prediction (e.g., ComplEx), and completion using GNNs or relation prediction (Elnagar et al., 2022, Li et al., 25 Mar 2024). For instance, in a domain like hotel information extraction, rule-based and pattern-matching methods are supplemented by ontology-guided semantic triple extraction and RDF integration (Anantharangachar et al., 2013).

Hybrid and Embedding-Based Completion

Ontology expansion benefits from hybridization: pattern-based (GNN) and textual inference (NLI/LLM). Embedding-based methods generalize witnessed rule-templates using contextualized vectors and GNNs (GCN, GAT) for “parallel rule” inference, while NLI/LLM models leverage verbalizations and world knowledge to score arbitrary candidate axioms. Hybrid fallback or weighted-sum strategies combine both, achieving superior F₁-scores than any single approach, particularly when template-coverage is incomplete or domain specificity is high (Li et al., 25 Mar 2024).

3. Ontology Expansion Techniques and Workflows

Conversational Ontology Expansion

In conversational systems, expansion is categorized as:

New Intent Discovery (NID): Clustering or classification of utterances to detect unseen intents; includes unsupervised, zero-shot, and semi-supervised approaches, with models such as DEC, label-aware BERT attention, and LLM-based OOD detection (Liang et al., 19 Oct 2024).
New Slot-Value Discovery (NSVD): Sequence tagging, clustering, and prototypical contrastive learning methods for slot and value identification, often under partial supervision.
Joint OnExp: Unified prediction of $(o^I, o^S, o^V)$ , leveraging shared representations, remains an area for further research.

Evaluation employs metrics such as accuracy after cluster alignment, Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Span-F1 for sequence labeling (Liang et al., 19 Oct 2024).

Scene Generation in Automated Vehicles

Domain-specific expansion in traffic scene modeling uses modular OWL ontologies enriched with SWRL rules for spatial, behavioral, and regulatory constraints. Scene generation enumerates infrastructure layouts, infers spatial relationships, recursively places participants, computes maneuvers, and filters according to domain rules. Expansion is quantified by the number and diversity of valid scenes generated while maintaining semantic constraints (e.g., 1,016 scenarios for a three-lane motorway with participants under constraint filtering) (Bagschik et al., 2017).

Image Ontology Engineering

Tools such as ImageSpace provide integrated environments for ontology creation, visualization, annotation, and consistency assurance under DAML+OIL. Incremental expansion is supported via GUI-driven addition of classes/properties, cardinality and domain/range operations, and direct annotation update to relational stores (0902.2953).

Core and Modularity Principles

Cases such as the Audiovisual Analytics Vocabulary and Ontology (AAVO) emphasize minimal, theory-aligned cores extended by modular domain-specific additions, linked via OWL and SKOS. Best practices involve explicit separation of core and extension, rigorous expert validation, and adoption of Linked Data standards (Fabbri et al., 2017).

4. Quality Assurance, Metrics, and Evaluation

Ontology development and expansion are governed by explicit metrics:

Coverage: $\mathrm{Coverage}(O, D) = \frac{|\{c \in D: c\text{ matches label/synonym in }O\}|}{|D|}$ , measuring representation of candidate concepts.
Precision/Recall for Extraction: Precision $\frac{|T|}{|N|}$ , Recall $\frac{|T|}{|M|}$ , where $N$ is proposed, $T$ accepted, and $M$ expert-agreed new concepts (Joachimiak et al., 3 Apr 2024).
Expansion Rate: $\frac{|O(t_2)| - |O(t_1)|}{t_2 - t_1}$ , quantifying class addition over time.
Consistency: Unsatisfiable class count $U(O)$ , validated to be zero under reasoners.
Superfluous-Element Rates: $SR_\text{classes} = C_\text{super}/C_\text{total}$ , assessing ontological noise (unused entities) in LLM-generated drafts (Lippolis et al., 7 Mar 2025).
Ontology Completion Benchmarks: F₁-score, Precision@k, MRR, combining GNN and NLI rankings (Li et al., 25 Mar 2024).

Automated reasoners, human expert validation, and competency question coverage are standard for quality control, augmented by reporting of critical pitfalls (e.g., OOPS! framework error classes) (Lippolis et al., 7 Mar 2025).

5. Challenges, Limitations, and Best Practices

Common challenges in ontology creation and expansion include ambiguity in concept and property semantics, integration of noisy or domain-mismatched data, control of “ontology bloat” and superfluous elements, and scalability for large instance sets (Li et al., 25 Mar 2024, Elnagar et al., 2022, Lippolis et al., 7 Mar 2025). LLMs, while enabling rapid ontology generation, can yield mistakes such as incorrect inverse relationship modeling, redundant domains/ranges, duplicate elements, and incomplete handling of reified relations.

Best practices synthesized across domains:

Anchor ontologies in well-defined objectives and competency questions.
Adopt modular architectures for domain separation and maintenance.
Automate and validate expansion through continuous integration pipelines, reasoner checks, and regular MIRO-compliance audits (Joachimiak et al., 3 Apr 2024).
Maintain human-in-the-loop review for critical integrations and edge cases.
Record ontology evolution through version control and detailed change logs (Rovetto et al., 2016).
Leverage hybrid modeling (pattern-based and LLM/NLI-based) for maximal coverage and plausibility (Li et al., 25 Mar 2024).
Ensure cross-disciplinary interoperability using multiple serializations, API-driven mappings (e.g., BioPortal, OAK), and alignment to external standards (Joachimiak et al., 3 Apr 2024, Fabbri et al., 2017).

6. Future Directions and Emerging Trends

Research trends emphasize further hybridization of LLM-driven and embedding-based expansion techniques, fine-tuning on minimal ontology modules to reduce superfluity, multi-modal ontology expansion (incorporating signals beyond text), and automation of cross-ontology mappings and maintenance (Lippolis et al., 7 Mar 2025, Liang et al., 19 Oct 2024).

For conversational understanding, early-stage detection and absorption of new intents/slots from minimal evidence, and holistic evaluation incorporating downstream dialog performance, represent current frontiers (Liang et al., 19 Oct 2024).

Overall, the discipline is converging toward frameworks that balance human expertise with automated, explainable, and scalable augmentation, facilitating semantic interoperability and adaptive knowledge management across scientific, technical, and organizational domains.