Abstract Relational Query Language (ARQL)
- ARQL is a semantics-first reference metalanguage that abstracts core query intent from incidental syntactic details.
- It decomposes queries into a relational core, multiple modalities (textual, ALT, diagrammatic), and environment-level conventions.
- ARQL’s framework supports higher-order constructs and optimization, making it ideal for LLM-driven query translation and rigorous verification.
An Abstract Relational Query Language (ARQL) is a semantics-first, reference metalanguage designed to separate query intent from user-facing syntax in relational data querying. ARQL makes explicit the underlying compositional patterns of a query and serves as a universal reference for both human and machine interpretation of relational query logic. This paradigm departs from surface-level idioms such as SQL keywords or specific syntactic rules, and instead foregrounds the core relational abstractions, alternative modalities for representation, and a clear separation between core semantics and environment-level conventions such as set/bag semantics or null handling (Gatterbauer et al., 15 Dec 2025, Pratten et al., 8 Sep 2025).
1. Motivation and Foundations
For four decades, SQL has served both as a query authoring language and a lingua franca among relational systems, applications, and, more recently, LLMs. However, SQL’s surface syntax conjoins abstract relational intent and incidental syntactic details—leading to ambiguity, redundancy, and difficulty in verifying semantic equivalence, especially in machine-generated SQL. The proliferation of LLMs for program synthesis and query translation amplifies these issues, making human users increasingly act as validators and debuggers rather than authors. In this context, ARQL is advocated as a theoretical and practical foundation for factoring out surface syntax, surfacing compositional query structure, and isolating environment-level conventions to enable clear reasoning, explanation, optimization, and interoperability among diverse query languages (Gatterbauer et al., 15 Dec 2025).
2. Core Components of ARQL
ARQL is characterized by three defining elements:
- Relational Core: A language-agnostic set of primitives for composing and manipulating relations. This includes comprehensions (set/bag), quantifiers, assignment predicates, comparison predicates, grouping operators, join annotations, and access to built-in or external relations.
- Modalities: Multiple lossless presentations of the underlying query structure, supporting diverse use cases from formal machine reasoning to human understanding. Typical modalities include textual comprehension notation, Abstract Language Tree (ALT) hierarchies, and diagrammatic (higraph) forms.
- Conventions: Orthogonal, environment-level parameters that influence observable results but not query intent. Examples include choice of set vs. bag semantics, NULL propagation and initialization, and the interpretation of aggregates over empty sets.
The table below organizes these components:
| Component | Examples | Role |
|---|---|---|
| Relational Core | Comprehensions, quantifiers, grouping, joins | Defines intent and structure of the query |
| Modalities | Textual, ALT, diagrammatic (higraph) | Supports communication and reasoning |
| Conventions | Set/bag mode, NULL handling, logic toggles | Captures environment-level semantics |
3. ARQL Data Models and Algebraic Extensions
The ARQL model, as formalized in recent work, generalizes traditional active-domain relations to encompass complete-domain relations (CDRs) and higher-order solution sets:
- Attribute Domains: Every attribute is associated with a potentially infinite domain (e.g., could be , Boolean, or a restricted range).
- Domain Relations (DRs): Relations are defined by characteristic functions , abstracting over actual extensions.
- Active-Domain Relations (ADRs): Finite support relations corresponding to actual SQL tables.
- Complete-Domain Relations (CDRs): Relations with potentially infinite support, enabling queries on infinite domains or symbolic constraints.
- Solution Sets and Exponentiation: Higher-order constructs representing sets of candidate relations or functions (e.g., mappings from to ). This constructs search spaces for optimization or combinatorial subset queries (Pratten et al., 8 Sep 2025).
Algebraic operations extend beyond the traditional set (union, join, projection) to solution-set formation (exponentiation), groupwise aggregation, and global filtering via higher-order predicates.
4. Modalities of Representation
ARQL supports three principal modalities for presenting the same underlying relational core:
- Comprehension-Style Textual Notation: A form akin to set/bag comprehensions or tuple relational calculus. For example:
represents a join with predicate.
- Abstract Language Tree (ALT): A hierarchical syntactic tree with nodes for operators such as QUERY, quantifiers, grouping, assignment, and predicates; edges encode containment and data flow.
- Diagrammatic Hierarchical Graph (Higraph): Nested boxes and edges visually represent quantifier scopes, attributes, groupings, and joins. These diagrammatic forms mirror relational diagrams and higraphs, aiding human understanding and debugging.
Each modality is a lossless, isomorphic representation of the relational core, facilitating both formal reasoning (e.g., via ALT for equivalence checking) and human interpretability (diagrammatic).
5. ARC: Abstract Relational Calculus as a Concrete ARQL
The Abstract Relational Calculus (ARC) is a concrete implementation of ARQL, providing a generalized tuple relational calculus with explicit extensions for grouping, aggregation, join annotations, and flexible modalities (Gatterbauer et al., 15 Dec 2025).
- Syntax: ARC is defined by a BNF incorporating comprehension heads, quantifier lists (including groupings and join annotations), and Boolean formula predicates with assignment, comparisons, aggregates, and negation.
- Semantics: Evaluation proceeds by binding tuple-variables per quantifier, filtering via formulas, and emitting outputs per head assignment, with aggregation and grouping resolved within scope.
- Modalities: ARC is illustrated in all three ARQL modalities, enabling seamless isomorphism between textual, tree, and diagrammatic representations.
Example in comprehension syntax (“for each department, sum the salaries of its employees”):
6. Interoperability and Comparative Analysis
ARQL and ARC function as "Rosetta Stones" for relational query languages. The language-independent vocabulary—quantifiers, grouping, assignments, comparisons, join annotation—enables direct comparison between disparate paradigms (e.g., SQL, Datalog, pipe algebra) and facilitates translation among them. Defined relations and explicit modularity improve compositional reasoning and reuse. Pattern-convention separation makes design choices (set/bag, HAVING/WHERE, NULL-semantics) explicit toggles in the ARQL interpreter.
In practical terms, ARQL is suitable as a backend target for NL2SQL pipelines: the ALT tree representation’s small operator set is easy to validate, traverse, and synthesize into other query languages, ensuring preservation of intended semantics during translation.
7. Expressiveness, Safety, and Applications
ARQL unifies traditional set-based queries, combinatorial subset selection, and optimization within a single algebraic system. Notably, ARQL admits NP-complete and NP-hard queries when using complete-domain relations and solution-set exponentiation, as in encoding 3-SAT or resource-constrained optimization problems. When restricted to ADRs (finite relations), ARQL guarantees safety and finite results. Homomorphic translation semantics enable projection of higher-order ARQL constructs—such as solution sets—back into ordinary relational algebra, facilitating execution on standard SQL backends (Pratten et al., 8 Sep 2025).
Applications include:
- Universally characterizing and comparing query languages and extensions,
- Serving as an intermediate representation for LLM-driven query generation and verification,
- Expressing complex optimization and subset queries (e.g., batch selection, combinatorial assignment) directly within extended SQL paradigms,
- Mechanically translating advanced algebraic queries into standard RA for evaluation by traditional relational engines or constraint solvers.
ARQL, as instantiated by ARC and augmented with higher-order algebraic constructs, marks a shift toward a unified, semantics-driven approach to relational query specification, translation, and optimization. It provides the missing vocabulary for modular relational reasoning, modality-agnostic translation, and transparent interoperability across a heterogeneous landscape of query languages and backend systems (Gatterbauer et al., 15 Dec 2025, Pratten et al., 8 Sep 2025).