Parameterized Equality Saturation

Updated 18 October 2025

Parameterized Equality Saturation is a global optimization strategy that non-destructively enriches a program’s IR with equivalence relations, eliminating phase ordering issues.
It leverages an E-PEG data structure to simultaneously represent exponentially many program variants and supports a final global selection via integer linear programming.
The approach enables modular, domain-specific, and context-sensitive optimizations across dataflow and control constructs while providing robust translation validation.

Parameterized equality saturation is a global optimization strategy in which equality analyses, defined as parameterized pattern-triggered modules, iteratively saturate a program’s functional intermediate representation with equivalence relations, ultimately enabling holistic program optimization without phase ordering constraints. The paradigmatic implementation leverages a specialized data structure—the E-PEG (Equality-Program Expression Graph)—which encodes exponentially many optimized forms of a program simultaneously and allows for a final global selection step via integer linear programming. The framework supports modular, domain-specific, and trigger-parameterized optimizations across both dataflow and control constructs, and provides foundational benefits for both optimization pipelines and translation validation.

1. Structure of Equality Saturation

Traditional compiler optimizers sequentially apply transformation rules to destructively rewrite an intermediate representation (IR), often resulting in phase ordering problems—where the effect of one optimization unintentionally disables or precludes another. Equality saturation replaces sequential rewriting with a non-destructive, additive process in which each optimization is reified as an “equality analysis”: a procedure that, upon matching its trigger pattern, simply adds an equivalence to the IR rather than modifying the program.

The key data structure is the E-PEG—a PEG (Program Expression Graph) with added equivalence classes. Each node in the E-PEG is connected not only by syntactic/conceptual dependencies, but also by equality edges that encode program fragments proven to be equivalent. The E-PEG thus admits the possibility of representing exponentially many versioned and optimized forms of the original program.

2. Parameterization of Equality Analyses

Parameterization enters the equality saturation approach at the level of the equality analyses. Each analysis is defined by:

a trigger pattern, possibly with free variables, which identifies subgraphs of the PEG or E-PEG subject to the potential optimization,
a callback function, which upon a trigger match instantiates and adds equality axioms to the E-PEG.

This design allows optimizations to be modular, reusable, and domain-aware. For example:

A strength reduction rule could trigger on any match of the form “a * 0” and parameterize over a,
A loop-induction variable strength reduction analysis could be parameterized over loop invariants and induction variables.

This approach naturally extends to sophisticated and context-sensitive optimizations (e.g., inlining, tail-recursion elimination, or domain-specific transformations). The parameters influence how and when new equivalences are instantiated in the graph, and allow control over the propagation and applicability of rules.

3. Intermediate Representation and Treatment of Control Flow

The intermediate representation at the center of equality saturation is the PEG, which encodes all control constructs functionally and referentially transparently. Its construction features:

For loops: θ-nodes (“theta”), which have arguments for initial value and iterative update, effectively capturing the full semantics of loop-carried dependencies.
For conditionals: φ-nodes (“phi”), with a Boolean condition and true/false cases as arguments.

This functional IR ensures that adding equalities is non-destructive and does not compromise program referential transparency. During equality saturation, PEG nodes are merged into equivalence classes whenever an equality analysis fires, so that the IR fragments become “equalized.” The structure permits simultaneous representation of all optimized versions—loop-invariant code motion, loop peeling, branch hoisting, induction variable rewriting, and more.

4. Saturation and Global Optimization Heuristic

The equality saturation process is defined operationally as:

a) Conversion: The input program (typically a CFG) is converted to a PEG, then to an E-PEG.

$\Optimize(cfg) = \IrToCfg \Big( \SelectBest \big( \Saturate(\CfgToIr(cfg)) \big) \Big)$

where $\Saturate$ denotes the iterative application of equality analyses.

b) Saturation Loop: For each equality analysis, all occurrences of the trigger pattern are found (by a dataflow pattern matcher, e.g., a Rete-style network). The corresponding callback generates equality edges to be added to the E-PEG. The process continues until a fixpoint is reached (no further equalities are added, or an explicit bound is met).

c) Global Selection: Once saturated, the E-PEG embodies an exponential family of candidate programs. Rather than perform sequential locally profitable rewrites as in classical compilers, a global profitability heuristic, typically via a pseudo-Boolean or 0–1 ILP solver, selects a single “best” program version:

$C(n) = basic(n) \cdot k^{depth(n)}$

Here, $basic(n)$ reflects operation cost, $depth(n)$ encodes loop-nesting, and constraints ensure the selected program is well-formed. The solution chooses one representative from each equivalence class, ensuring acyclic reconstruction and full dependence satisfaction.

5. Benefits: Phase-Order Independence, Expressiveness, and Validation

Elimination of Phase Ordering: All possible rewrites are preserved during saturation; no optimization can interfere with another.
Holistic, Global Profitability: The ILP-based selection ensures all interaction effects between optimizations, including those involving loops and control flow, are considered.
Expressiveness: The IR accommodates branches and loops, enabling unanticipated composite optimizations (e.g., loop peeling with code motion, or combined algebraic simplification with branch hoisting).
Extensibility: New parameterized equality analyses can be added modularly—in practice, domain-specific optimizations (e.g., vector library deforestation, or hardware-specific instruction redirection) are encoded readily by end-users.
Translation Validation: The E-PEG holds all versions of the input program, offering a strong foundation for semantic equivalence checking. The approach has validated the output of independent compilers (e.g., Soot), establishing semantic equivalence (or diagnosing incorrectness) reliably.

6. Experimental Results and Scalability

Empirical evaluations of parameterized equality saturation highlight:

Performance: Average method processing time is approximately 1.5 seconds for Java bytecode methods (over thousands of routines), with 84% of methods fully saturating without reaching imposed resource bounds (200 MB heap for JVM-based benchmarks).
Optimization Power: In micro-benchmarks, the approach uncovers compound optimization opportunities that traditional sequence orderings miss. In one case, a 7% speedup was achieved on a realistic raytracer benchmark due to the globally profitable selection.
Translation Validation Coverage: Peggy’s engine validated 98% of over 3,400 Soot-optimized Java methods, identifying subtle compiler bugs in the remainder, demonstrating practicality for large-scale semantic validation.
Space and Time Overheads: Saturation overhead is moderate; exponential explosion is mitigated by careful IR selection and triggering only on parameter-constrained patterns.

7. Mathematical Formulations and Properties

Key operators are formalized as:

Phi Operator:

$\phi(cond,t,f)(i) = \begin{cases} t(i) & \text{if } cond(i) = \text{true}, \ f(i) & \text{if } cond(i) = \text{false}. \end{cases}$

Theta Operator:

$\theta_\ell(base, loop)(i) = \begin{cases} base(i) & \text{if } i(\ell) = 0, \ loop(i[\ell \mapsto i(\ell)-1]) & \text{if } i(\ell) > 0. \end{cases}$

Global Cost Model: As described, cost functions are parameterized by loop depth, supporting profitable selection among deeply nested loop variants.
Monotonicity: Ensured by the property

$(ir_1 \; \#_a \; ir_2) \Rightarrow ir_1 \sqsubseteq ir_2$

meaning each added equality strictly refines the set of available program variants, never discarding information.

Parameterized equality saturation, as articulated in the E-PEG model, is a framework in which modular, parameter-driven equality analyses infuse a referentially transparent IR with equivalences. This saturation process circumvents phase ordering, supports full-spectrum optimization (arithmetic, control, domain-specific), and enables both selection of globally optimal code and semantic translation validation. The approach validates its practicality by scaling to realistic optimization workloads, discovering complex cross-cutting opportunities, and catching subtle compiler bugs—all within carefully bounded computational resources. The extensibility and soundness of parameterized equality analyses in this setting unifies decades of insights from algebraic rewriting, program optimization, and verification into a single operational methodology.

Markdown Report Issue Upgrade to Chat

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Parameterized Equality Saturation.