- The paper introduces RuleScript, a language that decouples query rewrite rule specification from engine execution, ensuring formal correctness.
- It leverages a relational algebra core and uninterpreted symbols to express rules, enabling automated equivalence proofs via SMT solvers.
- RuleScript achieves engine-agnostic portability, reducing implementation effort by up to 3× and enhancing optimizer reliability on TPC-H queries.
An Extensible and Verifiable Language for Query Rewrite Rules
Motivation and Core Contributions
The paper introduces RuleScript, a domain-specific language for specifying logical query rewrite rules with extensibility, formal verification, and portability across database engines (2605.05536). The motivation stems from prevailing deficiencies in traditional query rewrite rule development: rules are manually implemented, tightly coupled to optimizer backends, lack formal correctness checks, and are susceptible to duplication and subtle semantic bugs as seen across systems including Apache Calcite, CockroachDB, and commercial engines. The absence of principled correctness verification has resulted in numerous optimizer-induced errors reported in production systems.
RuleScript addresses these challenges by separating rule specification from engine execution infrastructure. It provides a relational-algebra-inspired core syntax, supports user-defined uninterpreted symbols (types, functions, plans), and decomposes rewriting into matching and transformation phases. This abstraction enables rules to be maximally general and schema-independent, while correctness is guaranteed by universal rule-level proofs using automated equivalence solvers.
Language Design and Semantics
RuleScript defines rewrite rules as pairs of abstract query plan patterns: a match pattern (qfrom) and a transform pattern (qto), over relational operators such as Projection, Selection, Aggregation, and Join. Rules are written using uninterpreted symbols (typed placeholders for relations, functions, expressions) that enable concise yet general specification. Rule matching corresponds to finding an instantiation for these symbols under which the match pattern fits a concrete plan, and transformation then generates a new plan using the same instantiation for the transform pattern.
Operators in the language are expressed as typed functions; complex predicates are decomposed using lambda binders, and schema-level abstractions are handled via uninterpreted type symbols. Custom operators—such as SemiJoin and proprietary engine-specific constructs—are incorporated by explicit semantic definitions in terms of RuleScript’s core algebra. This approach greatly enhances extensibility: engines can add domain-specific operators without compromising verification or portability.
Verification is performed statically at the rule level: RuleScript translates rule patterns into equivalence queries over all possible instantiations of uninterpreted symbols. This approach leverages QED, an SMT-backed query equivalence solver capable of reasoning universally over types and expressions. Custom operators are expanded to their core semantics before verification, ensuring the correctness of rules involving backend-specific constructs.
By performing rule-level equivalence checks rather than runtime instance-level validation, RuleScript eliminates the runtime overhead commonly seen with instance-wise verification using tools like Cosette or HoTTSQL. Rules are accepted only if semantic equivalence is proven for every valid instantiation—thus correctness holds for all schemas, predicates, and underlying plans without runtime performance penalties.
Engine-Agnostic Portability and Execution Pipeline
RuleScript introduces an adapter pattern for engine integration. Once verified, a rule can be ported to any target optimizer via a code-generation adapter or interpreter. The execution pipeline consists of:
- Matching: Traversing the plan to find instantiations for abstract symbols in the match pattern.
- Transformation: Applying these instantiations to generate the rewritten plan from the transform pattern.
Adapters decouple RuleScript from backend-specific APIs, enabling rules authored once to be deployed across heterogeneous engines. The study demonstrates this by reimplementing 33 rules from Apache Calcite and deploying them automatically into CockroachDB and Apache Data Fusion, with substantial reductions in implementation effort (up to 3× fewer lines of code compared to Calcite).
Evaluation
The authors evaluate RuleScript across four dimensions:
- Coverage: RuleScript expresses 33 rewrite rules spanning all key relational operator categories and transformation patterns found in Calcite. Limitations in coverage are primarily due to the equivalence solver (QED) and not the RuleScript language itself.
- Portability: Adding support for a new backend only requires implementing an adapter or interpreter, with no per-rule code modifications.
- Implementation Effort: RuleScript rules are significantly more compact than imperative implementations (median 21 lines per rule vs. 94 for Calcite), rivaling dedicated optimizer DSLs such as Optgen.
- Integration: Verified rules, once ported, fire on real-world TPC-H queries in CockroachDB and DataFusion, demonstrating practical optimizer improvements (geometric mean speedup of 1.5× for TPC-H workloads in CockroachDB).
Implications and Future Directions
RuleScript’s engine-agnostic, formally verified rule specification paradigm has implications for optimizer engineering and research:
- Portability: Rules authored in RuleScript are not tied to specific engines, supporting a “write once, deploy everywhere” philosophy. This solution directly mitigates the proliferation of duplicated, error-prone rule implementations observed in production optimizers.
- Formal Verification: By leveraging uninterpreted symbols and universal verifiability, RuleScript rules guarantee semantics-preservation independent of schema and plan structure.
- Extensibility: The architecture supports engine-specific custom operators and expressions, with formal reasoning as long as semantic definitions are provided in terms of the algebraic core.
- Maintainability: Separation of concerns and reduction in imperative code lead to better maintainability, testability, and auditability of optimizer rule sets.
- Practical Deployment: The automatic translation and integration with diverse engines demonstrate feasibility for adoption in both academic prototypes and commercial products.
Theoretical directions include broadening support for SQL fragments not currently addressed (operators with order-dependent semantics, probabilistic constructs), refinement of the adapter model, and integration with richer equivalence solvers—so that the rule language can cover windowing, sampling, and other advanced relational constructs.
Conclusion
RuleScript represents a robust solution for extensible, verifiable, and portable query rewrite rules. Its abstraction of query transformation logic, formal correctness guarantees, and adapter-based integration model collectively provide significant improvements in developer productivity and optimizer reliability. The approach has practical efficacy across multiple database frameworks, and its theoretical paradigm lays the groundwork for further advancements in verified query optimization and language-driven rule portability.