Equality Saturation Optimization
- Equality saturation is a technique that uses e-graphs to non-destructively capture and merge all equivalent program terms via comprehensive rewrite rules.
- It eliminates phase-ordering problems by applying all potential rewrites in parallel, enabling the extraction of globally optimized representations.
- Recent advancements include categorical generalizations, improved extraction algorithms, and support for variable binding, enhancing scalability in compilers and theorem provers.
Equality saturation is a non-destructive program transformation and optimization technique based on e-graphs, a data structure that compactly represents a large set of equivalent terms under an equational theory. Equality saturation iteratively, and in parallel, applies all possible rewrite rules to terms encoded in the e-graph, merging their equivalence classes, until a fixpoint or resource limit is reached. This approach is widely adopted in compilers, theorem provers, and formal reasoning systems because it eliminates classical phase-ordering problems and enables the systematic discovery of optimized or canonical representations from a vast equivalence space (Tate et al., 2010, Willsey et al., 2020, Suciu et al., 5 Jan 2025, Yang et al., 2021, Cruysse et al., 2023). The recent expansion of e-graph and equality saturation techniques includes categorical generalizations, formal semantic models, advanced extraction algorithms, and support for richer logical features such as case splitting and variable binding.
1. Formal Framework of Equality Saturation
At the core of equality saturation is the e-graph, which maintains a congruence closure over terms built from a signature Σ of operators. An e-graph consists of:
- E-classes: Sets of e-nodes representing terms known to be semantically equal.
- E-nodes: Records of the form , with and each an e-class ID (Willsey et al., 2020, Yang et al., 2021).
- Hash-consing: A mapping from canonicalized e-nodes to e-class IDs to support fast congruence detection.
- Union-find structure: Maintains the partitioning of e-class IDs and underpins the congruence closure invariant.
The saturation process repeatedly applies all rewrite rules by finding every possible e-match μ (substitution identifying in G), inserting any necessary new e-nodes, and merging their e-classes, followed by a rebuild step to re-establish congruence closure (Willsey et al., 2020, Singher et al., 2023).
Formally, for rewrite system , and initial e-graph , the saturation process forms an ascending chain (where is the inflationary immediate consequence operator), until a fixpoint representing all terms equivalent under is reached (Suciu et al., 5 Jan 2025). Extraction from this saturated e-graph yields a single representative (often via an ILP or dynamic programming pass) that minimizes a user-supplied cost function.
2. Advantages over Sequential Term Rewriting and Phase Ordering
Traditional optimization pipelines often destructively apply transformation passes in sequence, introducing phase-ordering problems in which the application of one optimization may preclude subsequent beneficial rewrites (Tate et al., 2010). In contrast, equality saturation accumulates all reachable rewrites in a shared, non-destructive e-graph. This ensures:
- Exploration of an exponential equivalence space: All forms generated by rewrite rules are represented simultaneously.
- Invariance to pass ordering: No transformation disables another; all applicable rewrites fire in parallel.
- Global extraction: The best program (or query plan, tensor computation, etc.) is selected post-saturation based on holistic cost modeling.
- Translation validation: Semantic equivalence of two representations can be checked by their classes after saturation (Tate et al., 2010).
These properties are central in robust superoptimization (Yang et al., 2021), functional array idiom recognition (Cruysse et al., 2023), and advanced relational query optimization (Bărbulescu et al., 2024).
3. Data Structures and Algorithms Enabling Scalable Equality Saturation
E-graph performance hinges on amortized, scalable congruence closure and efficient e-matching.
- Amortized rebuilding: Modern systems defer congruence-closure “repairs” between batches of rewrites, coalescing multiple upward merges and avoiding superlinear work (Willsey et al., 2020).
- E-class analyses: E-classes are extended with lattice-based analyses (for types, value ranges, shapes, etc.) to support domain-specific rewrites and analysis-triggered rule firing (Willsey et al., 2020, Zhang et al., 2023).
- Batch and worklist loops: Rule application is managed via worklists to apply all eligible rewrites per batch, avoiding repeated exploration of stable subgraphs (Cruysse et al., 2023).
Recent advances address further scalability bottlenecks:
- Guided saturation: Integration with reinforcement learning (Bărbulescu et al., 2024), sketch guides (Koehler et al., 2021), or probabilistic models (Peng et al., 1 Nov 2025) focus the search and control e-graph blowup in complex domains.
- Colored E-graphs: Support for hundreds of simultaneous assumptions ("colors") via a layered union-find, sharing a base e-graph and maintaining coarsened congruences for case splitting, solving exponential duplication problems in conditional reasoning (Singher et al., 2023).
4. Extraction Methods: From E-Graphs to Optimized Terms
The extraction phase seeks a minimal-cost representative from the exponential set encoded in the saturated e-graph (Tate et al., 2010, Yang et al., 2021, Wang et al., 2020).
- ILP-based extraction: Introduces binary selection variables for e-nodes, with “one-per-class” and child-coverage constraints. Extensions incorporate dominance relations (for SSA/CFG reuse) (Merckx et al., 24 Feb 2025).
- Greedy/Dynamic Programming Extraction: Linear-time, bottom-up passes compute per-class minimums for local cost models, but must correct for shared subexpression overcounts in general graphs (Hartmann et al., 2024).
- Sketch-constrained extraction: For domain-constrained optimization, only terms matching a user-supplied sketch are eligible for extraction (Koehler et al., 2021).
Advanced variants address side effects, control flow, or multi-objective cost models (Merckx et al., 24 Feb 2025, Merckx et al., 14 May 2025).
5. Recent Extensions and Theoretical Developments
5.1. Generalizations: Monoidal and Categorical Semantics
Recent work axiomatizes e-graphs as morphisms in the free semilattice-enriched symmetric monoidal category (SLatt-enriched SMC), generalizing classical equality saturation to arbitrary monoidal settings (including quantum circuits, dataflow, and algebraic structures) (Tiurin et al., 2024). Equivalence is managed by DPOI (double-pushout with interfaces) rewriting on e-hypergraphs: combinatorial structures whose nodes and edges capture both operator application and semilattice joins, and whose isomorphism class absorbs all structural monoidal equalities.
5.2. Handling Binding and Logical Cuts
Native support for variable binding—essential for λ-calculus and higher-order reasoning—is realized by modeling e-graphs as morphisms in the free SLatt-enriched closed symmetric monoidal category. This absorbs -equivalence and -reduction structurally, avoiding explicit substitutions and De Bruijn indices (Tiurin et al., 1 May 2025). Hierarchical hypergraphs encode the binding and equivalence hierarchy, and DPOI rewriting manipulates these structures soundly and completely with respect to term rewriting modulo SMC, semilattice, and binding laws.
5.3. Context-Sensitive and Conditional Reasoning
Conditional and context-sensitive rewrites (where rule acts only under certain assumptions or contexts) pose significant challenges. Colored E-graphs encode all color-indexed congruences as layered union-finds, enabling hundreds of simultaneous assumptions with only O(N + ∑U_c) space and computation (Singher et al., 2023).
6. Applications and Experimental Highlights
Equality saturation is central to many optimization and synthesis domains:
- Compiler and IR optimization: Classical (Tate et al., 2010), high-level Julia IR (Merckx et al., 24 Feb 2025), eqsat dialects in MLIR (Merckx et al., 14 May 2025), and compositional floating-point optimization (Willsey et al., 2020).
- Tensor graph superoptimization: Neural network acceleration via comprehensive rule application and cost-based DAG extraction, with order-of-magnitude speed and quality gains (Yang et al., 2021, Hartmann et al., 2024).
- Linear algebra and relational query plans: SystemML/SPORES optimizes linear algebra by relational encoding and saturation, discovering all known rewrites and novel optimizations with 1.2×–5× speedups (Wang et al., 2020).
- Boolean and hardware reasoning: Symbolic reasoning over gate-level netlists via domain-specific Boolean rule sets, extracting circuits with maximal high-level structure (full adders, etc.) and enormous verification speedups (Yin et al., 8 Apr 2025).
- Automated idiom detection: Minimalist array languages with only a handful of core rewrite rules, latent idiom rules, and equality saturation automatically recognize patterns and map them to highly optimized library calls, delivering up to 20× speedup on matrix multiplications (Cruysse et al., 2023).
- Learning and inference: RL-guided rule selection and hybrid e-graph/LLM workflow for combinatorial optimization (Bărbulescu et al., 2024, Peng et al., 1 Nov 2025).
- Rule inference: Using equality saturation itself to synthesize concise, high-coverage rewrite rule sets (Nandi et al., 2021).
7. Theoretical Foundations, Complexity, and Termination
Equality saturation’s fixpoint semantics is formalized via deterministic tree automata, providing a clean universal model for the set of equalities and justifying the correctness and convergence of saturation (Suciu et al., 5 Jan 2025). There are deep connections to the database chase; termination of equality saturation is RE-complete (single instance), Π₂-complete (for all terms), and undecidable for arbitrary e-graph instances. Syntactic acyclicity of rules (i.e., weak term acyclicity) gives a polynomial-time guarantee of convergence, providing a practical criterion for safe use in optimizers, theorem provers, and program analyzers.
Space complexity is bounded by O(N + ∑U_c) for colored e-graphs versus O(C·N) for naively forking per assumption. Batched congruence-closure and hash-consing ensure near-linear time per operation in practice (Singher et al., 2023, Willsey et al., 2020). Greedy extraction is linear in the number of e-nodes if subexpression sharing is correctly handled; ILP extraction is NP-hard but tractable for moderate sized e-graphs.
References:
- (Tate et al., 2010) Equality Saturation: A New Approach to Optimization
- (Willsey et al., 2020) egg: Fast and Extensible Equality Saturation
- (Suciu et al., 5 Jan 2025) Semantic foundations of equality saturation
- (Cruysse et al., 2023) Latent Idiom Recognition for a Minimalist Functional Array Language using Equality Saturation
- (Yang et al., 2021) Equality Saturation for Tensor Graph Superoptimization
- (Hartmann et al., 2024) Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search
- (Merckx et al., 24 Feb 2025) Equality Saturation for Optimizing High-Level Julia IR
- (Singher et al., 2023) Colored E-Graph: Equality Reasoning with Conditions
- (Tiurin et al., 2024) Equivalence Hypergraphs: DPO Rewriting for Monoidal E-Graphs
- (Tiurin et al., 1 May 2025) E-Graphs With Bindings
- (Merckx et al., 14 May 2025) eqsat: An Equality Saturation Dialect for Non-destructive Rewriting
- (Wang et al., 2020) SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra
- (Nandi et al., 2021) Rewrite Rule Inference Using Equality Saturation
- (Koehler et al., 2021) Sketch-Guided Equality Saturation: Scaling Equality Saturation to Complex Optimizations of Functional Programs
- (Bărbulescu et al., 2024) Learned Graph Rewriting with Equality Saturation: A New Paradigm in Relational Query Rewrite and Beyond
- (Yin et al., 8 Apr 2025) BoolE: Exact Symbolic Reasoning via Boolean Equality Saturation
- (Peng et al., 1 Nov 2025) Equality Saturation Guided by LLMs