Ontology-based Query Check (OBQC)

Updated 17 November 2025

OBQC is a framework that uses logic-based techniques, including UCQ rewriting and chase-based methods, to certify and validate queries against ontological constraints.
It constructs finite test suites via injective instantiation to verify reasoner soundness, completeness, and semantic correctness in diverse settings.
OBQC is applied in areas like Semantic Web reasoning, OBDA, and LLM-generated SPARQL validation, enhancing query repair and system certification.

Ontology-based Query Check (OBQC) refers to an array of logic-based techniques for verifying, certifying, or validating properties of queries—typically conjunctive (CQ), unions of conjunctive (UCQ), or SPARQL queries—in the presence of ontologies or ontological constraints. OBQC addresses decidability, completeness, expressibility, and semantic correctness issues across several Semantic Web, database, and knowledge graph settings, including software reasoner certification, LLM-generated query validation, and ontology-based query rewriting.

1. Foundational Definitions and Formal Models

OBQC frameworks are grounded in first-order logic, Description Logics (DLs), and database theory. The general OBQC setup includes:

Signature $\Sigma=(\mathsf{P}, \mathsf{I})$ with predicate and individual constants.
Ontology (TBox) $T$ : finite set of axioms, often in DL or existential rule form.
Data (ABox) $A$ : finite set of ground facts.
Query $Q$ : given as FOL formula or, in the SPARQL case, as a Basic Graph Pattern over the KG, sometimes parameterized by variables (non-Boolean).
Mappings (GAV, for OBDA setups): rules rewriting source schema views to ontology predicates.

A reasoner $\mathcal{R}$ is modeled as a function $(Q,T,A)\mapsto$ either the tuple set of answers or an "unsat" flag, with the following essential properties:

Soundness: returned answers are entailed;
Monotonicity: adding facts or axioms does not remove answers;
Renaming invariance: answer set unaffected by monotonically shifting individual names.

Fundamental OBQC problems include:

Completeness check: Is $\mathcal{R}$ complete for query $Q$ and ontology $T$ , i.e., $\mathcal{R}(Q, T, A) \supseteq \mathit{cert}(Q, T, A)$ for every $A$ ?
Expressibility and verification: For mapping scenarios, does there exist a $q_t$ over an ontology/mapping such that for all source DBs $D$ , $ans_{q_s}(D) = cert_{(\Omc, sch(\Mbf), q_t)}(\Mbf(D))$?
Semantic validation (SPARQL/KG): Does an LLM-generated query $Q$ align with the semantic constraints imposed by ontology $O$ ?

2. Logic-based Approach to Reasoner Certification

For DL and OWL-based settings, OBQC techniques enable automated, finite certification of reasoner completeness for given $(Q,T)$ pairs, even when the reasoner is not, in general, complete (Grau et al., 2014).

Test Suite Construction

Key steps:

UCQ or Datalog $^{\pm}$ rewriting: Compute a rewriting $R$ of query $Q$ w.r.t. ontology $T$ , yielding rule sets $R_{\bot}, R_Q$ (UCQ), or $(R_D, R_{\bot}, R_Q)$ (recursion-capable).
Injective instantiation: For each rule $r$ in $R$ , ground the body using fresh constants to form small test ABoxes. Heads yielding $Q$ are paired with queries $(A_r, Q)$ ; heads yielding $\bot$ become unsat tests.
Test suite $S = (S_{\bot}, S_Q)$ : $S_{\bot}$ covers unsatisfiable ABoxes; $S_Q$ covers answer preservation.

Guarantees

Passing all tests in $S$ is sufficient for $(Q,T)$ -completeness for all monotonic, faithful reasoners. Refinements (e.g., injective instantiation with strong faithfulness) can yield test suites linear in $|R|$ .

Practical Considerations

Complexity: UCQ rewriting is PTIME in $|T|$ for OWL 2 QL but may be exponential in general.
Empirical results: On LUBM+SPARQL, test suite generation is typically below 1,000 cases with compact ABoxes. Systems like Jena Max and DLE-Jena were certified complete; others failed certain tests depending on unhandled inference patterns.

3. Query Rewriting, Optimization, and Expressibility Assessment

OBQC methods generalize to query rewriting and optimization scenarios for ontological CQs with tuple-generating dependencies (TGDs) (Gottlob et al., 2011), as well as expressibility/verification in OBDA (Lutz et al., 2020).

Perfect Rewriting and OBQC

Given ontological constraints $\Sigma$ and query $q$ , a perfect rewriting $Q$ is a UCQ such that for all DBs $D$ , $D \cup \Sigma \models q(\vec{X}) \iff D \models Q(\vec{X})$ . OBQC becomes the decision problem: "Is $Q$ a perfect rewriting of $q$ under $\Sigma$ ?"

Algorithmic core:

TGD-rewrite: Resolution-style backward-chaining (factorization + rewrite steps) produces $Q_{fin}$ .
Soundness and completeness: $Q_{fin}$ is guaranteed perfect iff the chase-based semantic conditions hold.
Optimization: For linear TGDs, atom elimination using dependency graphs and coverage relationships yields minimal UCQs, supporting efficient OBQC.

OBDA Expressibility/Verification

For a given source query $q_s$ and mappings/ontology $(\Omc, \Mbf, \Sbf)$, the OBQC reasoning task is twofold:

Expressibility: Does there exist any $q_t$ such that $ans_{q_s}(D) = cert_{(\Omc, sch(\Mbf), q_t)}(\Mbf(D))$ for every $D$ ?
Verification: For candidate $q_t$ , does the above equality hold?

Key results:

For DL-Lite, both tasks are $\Pi^p_2$ -complete.
For $\mathcal{EL}$ / $\mathcal{ELHI}$ , complexity scales to coNEXPTIME or 2EXPTIME, depending on rootedness of source queries and generality.
Algorithms rely on forward/backward application of mappings and UCQ rewritings, with explicit homomorphism-based containment checks.

4. OBQC in Knowledge Graphs and LLM-Generated Query Validation

A newer instantiation of OBQC is deterministic, semantic validation of SPARQL queries generated by LLMs over knowledge graphs (Allemang et al., 20 May 2024).

Semantic Rule Checking

Formally, given an LLM-generated SPARQL query $Q$ and ontology $O$ (an RDF/OWL graph):

Extract BGP( $Q$ ) triples from WHERE clause.
For each triple, systematically apply a set of rules based on $O$ 's $rdfs:domain$ , $rdfs:range$ , and $rdfs:subClassOf$ axioms, as well as property definitions.
Violation of any rule (e.g., mismatch between subject type and property domain, use of undefined properties, incompatible domains/ranges) generates a concrete, human-readable error explanation.
Each rule is implemented as a SPARQL meta-query over the combined query/ontology graph.

Table: Core OBQC semantic checking rules and their application

Rule Name	Checked Constraint	Error Triggered If
Domain Rule	Subject has appropriate rdf:type	Not subclass of domain
Range Rule	Object has appropriate rdf:type	Not subclass of range
Double-Domain	Multiple properties on same subject	Domains not mutually subclassed
Double-Range	Multiple properties on same object	Ranges not mutually subclassed
Incorrect-Property	Predicate declared in ontology	Not present

Integration with Repair Pipelines

Pipeline: LLM generates $Q$ → OBQC checks → pass/fail → failed explanations are LLM-prompted for repair → iterate.
Empirical results: On a virtualized insurance KG (160 QA pairs), first-pass execution accuracy was 42.9%; OBQC-guided repair raised end-to-end accuracy to 72.6% with ∼8% "I don't know" and 19.4% error rate.
Rule prevalence: Double-domain (37.5%) and domain-range (22.8%) were most frequent error types.

OBQC's deterministic, interpretable nature ensures only ontology-compliant queries proceed to execution. This approach is particularly vital for mitigating the risks of semantic hallucination in LLM-powered QA settings.

5. Finite Model Reasoning and Guarantees

OBQC's theoretical foundations rely upon well-established results on finite controllability and decidable classes of existential rules (Amendola et al., 2017).

Finite Controllability

For any ontology $\Sigma$ from the five basic Datalog $^{\pm}$ fragments—linear, weakly-acyclic, guarded, sticky, and shy—Boolean conjunctive query answering is finitely controllable: $T \models q$ if and only if $T \models_{\mathrm{fin}} q$ (the query holds over all finite models).

Implication: OBQC procedures can restrict attention to finite (chase-based) model constructions without completeness loss.
Canonical Rewriting: Universal method translates an arbitrary OBQC instance to a propositional, joinless setting with corresponding query, preserving both finite and infinite entailment relations.
Complexity: Data complexity is PTIME in all fragments; combined complexity ranges from ExpTime to 2ExpTime.

OBQC Algorithmic Steps (Strong Fragments)

Compute canonical rewritings ( $D^c, \Sigma^c, q^c$ ).
Build restricted chase for $(D^c, \Sigma^c)$ to saturation or until $q^c$ is entailed.
Return certification decision; complexity bounds follow from fragment class.

6. Applications and Comparative Evaluation

OBQC is applied across software certification, query rewriting, OBDA bootstrapping, and LLM-powered KG interrogation.

Reasoner Comparison

Formal criteria allow ranking of reasoners:

For reasoners $\mathcal{R}_1, \mathcal{R}_2$ , define $\mathcal{R}_1 \preceq_{Q,T} \mathcal{R}_2$ if the answer set of $\mathcal{R}_1$ is always contained in that of $\mathcal{R}_2$ across all ABoxes, and unsat detection is at least as strong.
Representative ABox generation via subset-closed rewriting enables practical, finite testing for comparative certification.

Empirical Benchmarks

LUBM+SPARQL: Efficient OBQC-based test suite generation, systematic detection of completeness/incompleteness across major Semantic Web reasoner platforms.
LLM-KG QA (insurance): OBQC raised executed QA accuracy from 54.2% (Text-to-SPARQL baseline) to 72.6% with repair, establishing the practical value of ontological checks in neural-symbolic QA pipelines.

7. Complexity, Limitations, and Research Directions

Complexity Boundaries

OBQC procedures are tractable in data complexity in all mainstream Datalog $^{\pm}$ or OWL QL fragments. Combined complexity is exponential but amenable for practical ontologies/queries.
Expressibility/verification in OBDA is $\Pi^p_2$ -complete for DL-Lite, coNEXPTIME/2EXPTIME for $\mathcal{EL}$ / $\mathcal{ELHI}$ .

Limitations

Exponential size of test suites or rewritings for complex or highly recursive ontologies.
Some, but not all, fragments admit UCQ rewriting; for others, datalog-based or chase-based procedures are mandatory.
Scope restricted to frameworks with finite controllability; undecidable fragments or those with infinite models may require further research.

Future Developments

Richer semantic rule-sets for OBQC in LLM repair loops may further boost achievable repair rates.
Automated reduction/optimization of test suite sizes remains an ongoing research area.
Generalization to non-monotonic or probabilistic KBs is an open challenge.

OBQC unifies several research threads in semantic technologies, enabling both theoretical guarantees and practical certification of query answering systems with respect to expressive ontological constraints (Grau et al., 2014, Gottlob et al., 2011, Lutz et al., 2020, Amendola et al., 2017, Allemang et al., 20 May 2024). Its logic-based machinery ensures reliability and explainability, particularly in settings emphasizing semantic correctness and scalable deployment of ontology-aware information systems.

PDF Markdown Chat (Pro)

References (5)

Completeness Guarantees for Incomplete Ontology Reasoners: Theory and Practice (2014)

Ontological Queries: Rewriting and Optimization (Extended Version) (2011)

Query Expressibility and Verification in Ontology-Based Data Access (2020)

Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! (2024)

Finite model reasoning over existential rules (2017)

Follow Topic

Get notified by email when new papers are published related to Ontology-based Query Check (OBQC).