Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Extensible and Verifiable Language for Query Rewrite Rules

Published 7 May 2026 in cs.DB | (2605.05536v1)

Abstract: Logical query plan rewriting transforms a relational database query into an equivalent but more efficient form and is crucial to the performance of database-backed applications. In existing systems, rewrite rules are typically implemented manually, tightly coupled to specific execution engines, and often lack formal correctness guarantees. Consequently, developing a new engine requires reimplementing both legacy and new rules, incurring significant engineering cost, limiting portability, and every new implementation is an opportunity for introducing new bugs. We introduce Rulescript, an engine-agnostic domain-specific language (DSL) for developing query rewrite rules. Rulescript separates rule definition from execution infrastructure via a relational algebra-inspired core language and an explicit decomposition of rules into matching and transformation phases. Developers express rewrites by pattern-matching query plans using Rulescript's core operators and constructing semantically equivalent transformed plans, with all rewrites automatically verified formally to ensure correctness. Rulescript is extensible: users can define custom operators in terms of the core language to capture engine-specific semantics. To integrate with an existing system, developers need only implement a lightweight adapter that maps Rulescript's core and custom operators to the operators implemented in the target engine. We evaluate Rulescript by reimplementing 33 rewrite rules from Apache Calcite and extending the language with several custom operators. To demonstrate portability, we automatically deploy these rules to CockroachDB and Apache Data Fusion, two engines with substantially different backends. Our results show that Rulescript enables "write once, deploy everywhere" paradigm for query plan rewriting, with minimal effort required to deploy previously written rules on a new data engine.

Summary

  • The paper introduces RuleScript, a language that decouples query rewrite rule specification from engine execution, ensuring formal correctness.
  • It leverages a relational algebra core and uninterpreted symbols to express rules, enabling automated equivalence proofs via SMT solvers.
  • RuleScript achieves engine-agnostic portability, reducing implementation effort by up to 3× and enhancing optimizer reliability on TPC-H queries.

An Extensible and Verifiable Language for Query Rewrite Rules

Motivation and Core Contributions

The paper introduces RuleScript, a domain-specific language for specifying logical query rewrite rules with extensibility, formal verification, and portability across database engines (2605.05536). The motivation stems from prevailing deficiencies in traditional query rewrite rule development: rules are manually implemented, tightly coupled to optimizer backends, lack formal correctness checks, and are susceptible to duplication and subtle semantic bugs as seen across systems including Apache Calcite, CockroachDB, and commercial engines. The absence of principled correctness verification has resulted in numerous optimizer-induced errors reported in production systems.

RuleScript addresses these challenges by separating rule specification from engine execution infrastructure. It provides a relational-algebra-inspired core syntax, supports user-defined uninterpreted symbols (types, functions, plans), and decomposes rewriting into matching and transformation phases. This abstraction enables rules to be maximally general and schema-independent, while correctness is guaranteed by universal rule-level proofs using automated equivalence solvers.

Language Design and Semantics

RuleScript defines rewrite rules as pairs of abstract query plan patterns: a match pattern (qfromq_{\text{from}}) and a transform pattern (qtoq_{\text{to}}), over relational operators such as Projection, Selection, Aggregation, and Join. Rules are written using uninterpreted symbols (typed placeholders for relations, functions, expressions) that enable concise yet general specification. Rule matching corresponds to finding an instantiation for these symbols under which the match pattern fits a concrete plan, and transformation then generates a new plan using the same instantiation for the transform pattern.

Operators in the language are expressed as typed functions; complex predicates are decomposed using lambda binders, and schema-level abstractions are handled via uninterpreted type symbols. Custom operators—such as SemiJoin and proprietary engine-specific constructs—are incorporated by explicit semantic definitions in terms of RuleScript’s core algebra. This approach greatly enhances extensibility: engines can add domain-specific operators without compromising verification or portability.

Formal Correctness Verification

Verification is performed statically at the rule level: RuleScript translates rule patterns into equivalence queries over all possible instantiations of uninterpreted symbols. This approach leverages QED, an SMT-backed query equivalence solver capable of reasoning universally over types and expressions. Custom operators are expanded to their core semantics before verification, ensuring the correctness of rules involving backend-specific constructs.

By performing rule-level equivalence checks rather than runtime instance-level validation, RuleScript eliminates the runtime overhead commonly seen with instance-wise verification using tools like Cosette or HoTTSQL. Rules are accepted only if semantic equivalence is proven for every valid instantiation—thus correctness holds for all schemas, predicates, and underlying plans without runtime performance penalties.

Engine-Agnostic Portability and Execution Pipeline

RuleScript introduces an adapter pattern for engine integration. Once verified, a rule can be ported to any target optimizer via a code-generation adapter or interpreter. The execution pipeline consists of:

  • Matching: Traversing the plan to find instantiations for abstract symbols in the match pattern.
  • Transformation: Applying these instantiations to generate the rewritten plan from the transform pattern.

Adapters decouple RuleScript from backend-specific APIs, enabling rules authored once to be deployed across heterogeneous engines. The study demonstrates this by reimplementing 33 rules from Apache Calcite and deploying them automatically into CockroachDB and Apache Data Fusion, with substantial reductions in implementation effort (up to 3×\times fewer lines of code compared to Calcite).

Evaluation

The authors evaluate RuleScript across four dimensions:

  1. Coverage: RuleScript expresses 33 rewrite rules spanning all key relational operator categories and transformation patterns found in Calcite. Limitations in coverage are primarily due to the equivalence solver (QED) and not the RuleScript language itself.
  2. Portability: Adding support for a new backend only requires implementing an adapter or interpreter, with no per-rule code modifications.
  3. Implementation Effort: RuleScript rules are significantly more compact than imperative implementations (median 21 lines per rule vs. 94 for Calcite), rivaling dedicated optimizer DSLs such as Optgen.
  4. Integration: Verified rules, once ported, fire on real-world TPC-H queries in CockroachDB and DataFusion, demonstrating practical optimizer improvements (geometric mean speedup of 1.5×\times for TPC-H workloads in CockroachDB).

Implications and Future Directions

RuleScript’s engine-agnostic, formally verified rule specification paradigm has implications for optimizer engineering and research:

  • Portability: Rules authored in RuleScript are not tied to specific engines, supporting a “write once, deploy everywhere” philosophy. This solution directly mitigates the proliferation of duplicated, error-prone rule implementations observed in production optimizers.
  • Formal Verification: By leveraging uninterpreted symbols and universal verifiability, RuleScript rules guarantee semantics-preservation independent of schema and plan structure.
  • Extensibility: The architecture supports engine-specific custom operators and expressions, with formal reasoning as long as semantic definitions are provided in terms of the algebraic core.
  • Maintainability: Separation of concerns and reduction in imperative code lead to better maintainability, testability, and auditability of optimizer rule sets.
  • Practical Deployment: The automatic translation and integration with diverse engines demonstrate feasibility for adoption in both academic prototypes and commercial products.

Theoretical directions include broadening support for SQL fragments not currently addressed (operators with order-dependent semantics, probabilistic constructs), refinement of the adapter model, and integration with richer equivalence solvers—so that the rule language can cover windowing, sampling, and other advanced relational constructs.

Conclusion

RuleScript represents a robust solution for extensible, verifiable, and portable query rewrite rules. Its abstraction of query transformation logic, formal correctness guarantees, and adapter-based integration model collectively provide significant improvements in developer productivity and optimizer reliability. The approach has practical efficacy across multiple database frameworks, and its theoretical paradigm lays the groundwork for further advancements in verified query optimization and language-driven rule portability.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 22 likes about this paper.