Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 96 tok/s
Gemini 3.0 Pro 48 tok/s Pro
Gemini 2.5 Flash 155 tok/s Pro
Kimi K2 197 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Ontology-based Query Check (OBQC)

Updated 17 November 2025
  • OBQC is a framework that uses logic-based techniques, including UCQ rewriting and chase-based methods, to certify and validate queries against ontological constraints.
  • It constructs finite test suites via injective instantiation to verify reasoner soundness, completeness, and semantic correctness in diverse settings.
  • OBQC is applied in areas like Semantic Web reasoning, OBDA, and LLM-generated SPARQL validation, enhancing query repair and system certification.

Ontology-based Query Check (OBQC) refers to an array of logic-based techniques for verifying, certifying, or validating properties of queries—typically conjunctive (CQ), unions of conjunctive (UCQ), or SPARQL queries—in the presence of ontologies or ontological constraints. OBQC addresses decidability, completeness, expressibility, and semantic correctness issues across several Semantic Web, database, and knowledge graph settings, including software reasoner certification, LLM-generated query validation, and ontology-based query rewriting.

1. Foundational Definitions and Formal Models

OBQC frameworks are grounded in first-order logic, Description Logics (DLs), and database theory. The general OBQC setup includes:

  • Signature Σ=(P,I)\Sigma=(\mathsf{P}, \mathsf{I}) with predicate and individual constants.
  • Ontology (TBox) TT: finite set of axioms, often in DL or existential rule form.
  • Data (ABox) AA: finite set of ground facts.
  • Query QQ: given as FOL formula or, in the SPARQL case, as a Basic Graph Pattern over the KG, sometimes parameterized by variables (non-Boolean).
  • Mappings (GAV, for OBDA setups): rules rewriting source schema views to ontology predicates.

A reasoner R\mathcal{R} is modeled as a function (Q,T,A)(Q,T,A)\mapsto either the tuple set of answers or an "unsat" flag, with the following essential properties:

  • Soundness: returned answers are entailed;
  • Monotonicity: adding facts or axioms does not remove answers;
  • Renaming invariance: answer set unaffected by monotonically shifting individual names.

Fundamental OBQC problems include:

  • Completeness check: Is R\mathcal{R} complete for query QQ and ontology TT, i.e., R(Q,T,A)cert(Q,T,A)\mathcal{R}(Q, T, A) \supseteq \mathit{cert}(Q, T, A) for every AA?
  • Expressibility and verification: For mapping scenarios, does there exist a qtq_t over an ontology/mapping such that for all source DBs DD, $ans_{q_s}(D) = cert_{(\Omc, sch(\Mbf), q_t)}(\Mbf(D))$?
  • Semantic validation (SPARQL/KG): Does an LLM-generated query QQ align with the semantic constraints imposed by ontology OO?

2. Logic-based Approach to Reasoner Certification

For DL and OWL-based settings, OBQC techniques enable automated, finite certification of reasoner completeness for given (Q,T)(Q,T) pairs, even when the reasoner is not, in general, complete (Grau et al., 2014).

Test Suite Construction

Key steps:

  1. UCQ or Datalog ±^{\pm} rewriting: Compute a rewriting RR of query QQ w.r.t. ontology TT, yielding rule sets R,RQR_{\bot}, R_Q (UCQ), or (RD,R,RQ)(R_D, R_{\bot}, R_Q) (recursion-capable).
  2. Injective instantiation: For each rule rr in RR, ground the body using fresh constants to form small test ABoxes. Heads yielding QQ are paired with queries (Ar,Q)(A_r, Q); heads yielding \bot become unsat tests.
  3. Test suite S=(S,SQ)S = (S_{\bot}, S_Q): SS_{\bot} covers unsatisfiable ABoxes; SQS_Q covers answer preservation.

Guarantees

Passing all tests in SS is sufficient for (Q,T)(Q,T)-completeness for all monotonic, faithful reasoners. Refinements (e.g., injective instantiation with strong faithfulness) can yield test suites linear in R|R|.

Practical Considerations

  • Complexity: UCQ rewriting is PTIME in T|T| for OWL 2 QL but may be exponential in general.
  • Empirical results: On LUBM+SPARQL, test suite generation is typically below 1,000 cases with compact ABoxes. Systems like Jena Max and DLE-Jena were certified complete; others failed certain tests depending on unhandled inference patterns.

3. Query Rewriting, Optimization, and Expressibility Assessment

OBQC methods generalize to query rewriting and optimization scenarios for ontological CQs with tuple-generating dependencies (TGDs) (Gottlob et al., 2011), as well as expressibility/verification in OBDA (Lutz et al., 2020).

Perfect Rewriting and OBQC

Given ontological constraints Σ\Sigma and query qq, a perfect rewriting QQ is a UCQ such that for all DBs DD, DΣq(X)    DQ(X)D \cup \Sigma \models q(\vec{X}) \iff D \models Q(\vec{X}). OBQC becomes the decision problem: "Is QQ a perfect rewriting of qq under Σ\Sigma?"

Algorithmic core:

  • TGD-rewrite: Resolution-style backward-chaining (factorization + rewrite steps) produces QfinQ_{fin}.
  • Soundness and completeness: QfinQ_{fin} is guaranteed perfect iff the chase-based semantic conditions hold.
  • Optimization: For linear TGDs, atom elimination using dependency graphs and coverage relationships yields minimal UCQs, supporting efficient OBQC.

OBDA Expressibility/Verification

For a given source query qsq_s and mappings/ontology $(\Omc, \Mbf, \Sbf)$, the OBQC reasoning task is twofold:

  1. Expressibility: Does there exist any qtq_t such that $ans_{q_s}(D) = cert_{(\Omc, sch(\Mbf), q_t)}(\Mbf(D))$ for every DD?
  2. Verification: For candidate qtq_t, does the above equality hold?

Key results:

  • For DL-Lite, both tasks are Π2p\Pi^p_2-complete.
  • For EL\mathcal{EL}/ELHI\mathcal{ELHI}, complexity scales to coNEXPTIME or 2EXPTIME, depending on rootedness of source queries and generality.
  • Algorithms rely on forward/backward application of mappings and UCQ rewritings, with explicit homomorphism-based containment checks.

4. OBQC in Knowledge Graphs and LLM-Generated Query Validation

A newer instantiation of OBQC is deterministic, semantic validation of SPARQL queries generated by LLMs over knowledge graphs (Allemang et al., 20 May 2024).

Semantic Rule Checking

Formally, given an LLM-generated SPARQL query QQ and ontology OO (an RDF/OWL graph):

  • Extract BGP(QQ) triples from WHERE clause.
  • For each triple, systematically apply a set of rules based on OO's rdfs:domainrdfs:domain, rdfs:rangerdfs:range, and rdfs:subClassOfrdfs:subClassOf axioms, as well as property definitions.
  • Violation of any rule (e.g., mismatch between subject type and property domain, use of undefined properties, incompatible domains/ranges) generates a concrete, human-readable error explanation.
  • Each rule is implemented as a SPARQL meta-query over the combined query/ontology graph.

Table: Core OBQC semantic checking rules and their application

Rule Name Checked Constraint Error Triggered If
Domain Rule Subject has appropriate rdf:type Not subclass of domain
Range Rule Object has appropriate rdf:type Not subclass of range
Double-Domain Multiple properties on same subject Domains not mutually subclassed
Double-Range Multiple properties on same object Ranges not mutually subclassed
Incorrect-Property Predicate declared in ontology Not present

Integration with Repair Pipelines

  • Pipeline: LLM generates QQ → OBQC checks → pass/fail → failed explanations are LLM-prompted for repair → iterate.
  • Empirical results: On a virtualized insurance KG (160 QA pairs), first-pass execution accuracy was 42.9%; OBQC-guided repair raised end-to-end accuracy to 72.6% with ∼8% "I don't know" and 19.4% error rate.
  • Rule prevalence: Double-domain (37.5%) and domain-range (22.8%) were most frequent error types.

OBQC's deterministic, interpretable nature ensures only ontology-compliant queries proceed to execution. This approach is particularly vital for mitigating the risks of semantic hallucination in LLM-powered QA settings.

5. Finite Model Reasoning and Guarantees

OBQC's theoretical foundations rely upon well-established results on finite controllability and decidable classes of existential rules (Amendola et al., 2017).

Finite Controllability

For any ontology Σ\Sigma from the five basic Datalog±^{\pm} fragments—linear, weakly-acyclic, guarded, sticky, and shy—Boolean conjunctive query answering is finitely controllable: TqT \models q if and only if TfinqT \models_{\mathrm{fin}} q (the query holds over all finite models).

  • Implication: OBQC procedures can restrict attention to finite (chase-based) model constructions without completeness loss.
  • Canonical Rewriting: Universal method translates an arbitrary OBQC instance to a propositional, joinless setting with corresponding query, preserving both finite and infinite entailment relations.
  • Complexity: Data complexity is PTIME in all fragments; combined complexity ranges from ExpTime to 2ExpTime.

OBQC Algorithmic Steps (Strong Fragments)

  • Compute canonical rewritings (Dc,Σc,qcD^c, \Sigma^c, q^c).
  • Build restricted chase for (Dc,Σc)(D^c, \Sigma^c) to saturation or until qcq^c is entailed.
  • Return certification decision; complexity bounds follow from fragment class.

6. Applications and Comparative Evaluation

OBQC is applied across software certification, query rewriting, OBDA bootstrapping, and LLM-powered KG interrogation.

Reasoner Comparison

Formal criteria allow ranking of reasoners:

  • For reasoners R1,R2\mathcal{R}_1, \mathcal{R}_2, define R1Q,TR2\mathcal{R}_1 \preceq_{Q,T} \mathcal{R}_2 if the answer set of R1\mathcal{R}_1 is always contained in that of R2\mathcal{R}_2 across all ABoxes, and unsat detection is at least as strong.
  • Representative ABox generation via subset-closed rewriting enables practical, finite testing for comparative certification.

Empirical Benchmarks

  • LUBM+SPARQL: Efficient OBQC-based test suite generation, systematic detection of completeness/incompleteness across major Semantic Web reasoner platforms.
  • LLM-KG QA (insurance): OBQC raised executed QA accuracy from 54.2% (Text-to-SPARQL baseline) to 72.6% with repair, establishing the practical value of ontological checks in neural-symbolic QA pipelines.

7. Complexity, Limitations, and Research Directions

Complexity Boundaries

  • OBQC procedures are tractable in data complexity in all mainstream Datalog±^{\pm} or OWL QL fragments. Combined complexity is exponential but amenable for practical ontologies/queries.
  • Expressibility/verification in OBDA is Π2p\Pi^p_2-complete for DL-Lite, coNEXPTIME/2EXPTIME for EL\mathcal{EL}/ELHI\mathcal{ELHI}.

Limitations

  • Exponential size of test suites or rewritings for complex or highly recursive ontologies.
  • Some, but not all, fragments admit UCQ rewriting; for others, datalog-based or chase-based procedures are mandatory.
  • Scope restricted to frameworks with finite controllability; undecidable fragments or those with infinite models may require further research.

Future Developments

  • Richer semantic rule-sets for OBQC in LLM repair loops may further boost achievable repair rates.
  • Automated reduction/optimization of test suite sizes remains an ongoing research area.
  • Generalization to non-monotonic or probabilistic KBs is an open challenge.

OBQC unifies several research threads in semantic technologies, enabling both theoretical guarantees and practical certification of query answering systems with respect to expressive ontological constraints (Grau et al., 2014, Gottlob et al., 2011, Lutz et al., 2020, Amendola et al., 2017, Allemang et al., 20 May 2024). Its logic-based machinery ensures reliability and explainability, particularly in settings emphasizing semantic correctness and scalable deployment of ontology-aware information systems.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Ontology-based Query Check (OBQC).