Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

Published 22 Apr 2026 in cs.AI | (2604.20795v1)

Abstract: This paper presents a hybrid architecture for intelligent systems in which LLMs are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph using RDF/OWL representations, enabling persistent, verifiable, and semantically grounded reasoning. The core contribution is an automated pipeline for ontology construction from heterogeneous data sources, including documents, APIs, and dialogue logs. The system performs entity recognition, relation extraction, normalization, and triple generation, followed by validation using SHACL and OWL constraints, and continuous graph updates. During inference, LLMs operate over a combined context that integrates vector-based retrieval with graph-based reasoning and external tool interaction. Experimental observations on planning tasks, including the Tower of Hanoi benchmark, indicate that ontology augmentation improves performance in multi-step reasoning scenarios compared to baseline LLM systems. In addition, the ontology layer enables formal validation of generated outputs, transforming the system into a generation-verification-correction pipeline. The proposed architecture addresses key limitations of current LLM-based systems, including lack of long-term memory, weak structural understanding, and limited reasoning capabilities. It provides a foundation for building agent-based systems, robotics applications, and enterprise AI solutions that require persistent knowledge, explainability, and reliable decision-making.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a neuro-symbolic framework that combines LLMs with structured ontological graphs to achieve persistent and verifiable memory.
It leverages RDF/OWL, SPARQL, and SHACL for semantic validation and vector retrieval, leading to improved planning and regulatory QA performance.
Empirical results in planning tasks and compliance assessments highlight benefits while also revealing challenges in schema alignment and error propagation.

Neuro-Symbolic Automatic Ontology Construction for Hybrid Intelligent Systems

Problem Setting and Theoretical Motivation

The paper addresses the limitations of purely vector-based LLM architectures, specifically their inability to persistently store, structure, and verify dynamic world models beyond their static weights and ephemeral context window. Unlike humans, who compress text into structured relational knowledge, LLMs operate with highly local context and lack robust long-term and verifiable memory representation. This is especially consequential in agent-based and robotic systems requiring cumulative, temporal, and explainable knowledge across sessions, decision histories, user profiles, and complex planning problems.

To bridge this gap, the paper formalizes a neuro-symbolic hybrid architecture in which an LLM is not the sole knowledge store but instead orchestrates, interprets, and augments an external memory composed of an ontological graph (in RDF/OWL), vector RAG storage, SHACL validators, SPARQL query interfaces, dialogue logs, and associated embeddings. This design enables the automatic construction and management of ontologies from heterogeneous textual and interaction data, thus transitioning the system from a retrieval overlay to a structured, consistent, and explainable world model.

The methodological backbone draws from the W3C standards stack, with RDF/OWL for typed graphs, SPARQL for declarative querying, and SHACL for graph-shape constraints. Recent literature (Bian, 23 Oct 2025, Kommineni et al., 2024, Lippolis et al., 7 Mar 2025, Nayyeri et al., 2 Jun 2025, Feng et al., 2024) demonstrates substantial progress in semi- and fully-automated ontology construction via LLMs, including extraction, alignment, and validation of entities, relations, and axioms. While LLMs can accelerate ontology population and schema draft generation, challenges remain in verification, normalization, and error containment.

Concurrently, the GraphRAG literature (Edge et al., 2024, Han et al., 2024) reveals that augmenting retrieval with explicit graph-structured context significantly improves performance on tasks requiring global reasoning, as local embedding similarity alone fails to support sensemaking and causality. Context-centric and multi-agentic AI frameworks (Khurdula et al., 4 Feb 2026, Belcak et al., 2 Jun 2025) further endorse the separation of memory, reasoning, and action layers, leveraging small, specialized models and agent orchestration atop persistent symbolic foundations.

System Architecture and Pipeline

The system uses dual memory: a symbolic graph $\mathcal{G}$ (entities, relations, axioms) and a vector store $\mathcal{V}$ (dense retrieval over text and logs). The Model Context Protocol (MCP) acts as the orchestration bus, bridging the LLM, the ontological graph, vector RAG storage, APIs, agent layer, tool interfaces, and logs. For a query $q$ , context assembly fuses results from vector retrieval $R_\text{vect}(q)$ , SPARQL graph queries $R_\text{graph}(q)$ , tool/API calls $R_\text{tool}(q)$ , and user/session memory $M_\text{user}$ .

The Ontology Builder pipeline automates the loop:

Ingest heterogeneous documents/dialogues, segment, and index.
NER/entity typing to extract candidate terms.
Relation extraction to hypothesize predicates and dependencies.
Normalization/alignment for consistency across sources.
Triple construction for structured serialization.
SHACL/OWL-based validation and reasoning to enforce constraints, consistency, and inheritance.
Storage as versioned TTL/RDF artifacts in the triple store.
Closed-loop enrichment: extracted facts from LLM generations feed back into the ontology pending validation.

This pipeline enables robust, incremental, and reproducible knowledge engineering, mitigating manual labor while maintaining schema quality via human-in-the-loop governance, as emphasized in recent benchmarking surveys (Bian, 23 Oct 2025, Lippolis et al., 7 Mar 2025).

Empirical Results and Evidence

The empirical evaluation utilizes a Tower of Hanoi planning benchmark and qualitative regulatory compliance QA (Fact Analyzer). In the planning domain, ontological augmentation of Qwen3-Max yields success rate improvements for 3 disks (26.3%→33.3%) and 5 disks (33.3%→45.5%), with parity at 4 disks and failure at 6 disks. This empirically supports claims that symbolic world modeling and constraint checking bolster autoregressive LLMs, particularly in the 'medium-complexity corridor' where pure LLMs' plan tracking and fidelity degrade (Kambhampati et al., 2024, Nayyeri et al., 2 Jun 2025).

In regulatory QA, the ontological layer enables strict logical verification. The system distinguishes between supported and contradicted answers, verifying generated claims against structured regulatory rules, thus providing explainable neuro-symbolic QA rather than plausibility-driven text generation.

Nevertheless, limitations include the absence of rigorous protocol specification, undefined sample sizes, prompt templates, and aggregation metrics, as well as the lack of a full-fledged reproducible benchmark suite. Thus, results are best interpreted as architectural demonstration rather than final statistical proof.

Practical and Theoretical Implications

The implications of this architecture are extensive:

Structured Long-Term Memory: Ontological graphs enable persistent, verifiable, and personalized memory for multi-session and agentic systems, in contrast to ephemeral chat histories.
Explainable Reasoning: SHACL and OWL validation afford logical trailability and post-hoc validation, supporting regulatory compliance and trustworthy QA.
Agentic Orchestration: Decomposition into LLM generation, symbolic planning/verification, and agent-based tool invocation realizes robust and interpretable hybrid AI stacks.
Data Integration: The ontology serves as a semantic layer for federation of documents, APIs, tables, logs, and world-state across heterogeneous sources, a prerequisite for robust digital twin and robot autonomy scenarios.
Scalable Automation: Semi-automated pipelines reduce, but do not eliminate, human curation—alignment, schema drift, and hallucination risks persist, necessitating ongoing governance.

These results synchronize with recent advocacy for modular neuro-symbolic architectures in LLM-modulo, context-centric, and agentic AI research (Kambhampati et al., 2024, Belcak et al., 2 Jun 2025, Khurdula et al., 4 Feb 2026).

Limitations and Future Directions

Automatic ontology construction remains prone to error propagation, schema misalignment, and hallucination. Validation introduces non-trivial orchestration and latency costs. Complete elimination of manual schema and corpus governance remains infeasible. Further, empirical evidence here is qualitative/descriptive rather than benchmarked at scale.

Future research will be required to:

Develop reproducible, large-scale planning and reasoning benchmarks for ontology-augmented models.
Extend ontologies with temporal and version-tracking capabilities for non-monotonic domains.
Implement probabilistic and trust layers on top of symbolic triples.
Deepen integration between persistent graph memory and action-oriented tool-using agents, especially in robotics and continual learning.
Address open challenges in alignment, normalization, and continuous schema evolution.

Conclusion

The paper demonstrates that incorporating LLMs as orchestrators, verifiers, and planners atop an ontology-centric hybrid architecture substantively enhances system memory, explainability, and planning capacitates in agentic AI. Ontology not only improves retrieval but enforces structure, validation, and long-term persistence, rendering the transition from text overlays to explicit world-modeling feasible. This architectural paradigm, integrating MCP, RDF/OWL, vector RAG, agent layers, and formal verification, aligns with prominent trajectories in neuro-symbolic and agentic AI research. While operational challenges remain, this direction offers a robust pathway toward explainable, persistent, and verifiable hybrid intelligent systems.

Markdown Report Issue