Colored E-Graphs
- Colored E-graphs are generalized data structures that encode multiple coarsened congruence relations within a unified framework.
- They employ a layered union-find structure to share base merges while tracking local colored merges under case-specific assumptions.
- Optimized colored E-graphs significantly lower memory overhead and improve runtime performance in SMT solving, program optimization, and theorem proving.
A colored E-graph is a generalized data structure that encodes multiple coarsened congruence relations, or multiple "possible worlds," in a single e-graph-like representation. Originally developed to improve the efficiency of equality reasoning with case splits in formal logic and program analysis, colored E-graphs achieve substantial asymptotic and practical resource savings by sharing structure across variants induced by different assumptions (Singher et al., 2023).
1. Model: Standard and Colored E-Graphs
A standard e-graph represents congruence over a set of ground terms built from a signature . This is achieved by maintaining equivalence classes (e-classes) over subterms and supporting congruence closure under function application. Conventional e-graphs are effective for equality saturation under unconditional rewrite rules: repeated application of universal equalities until closure.
Conditional rewrite rules (for example, ) compel one to consider multiple, mutually inconsistent sets of assumptions or "branches". Traditionally, this is handled by duplicating the entire e-graph per case, leading to time and memory blow-up for case splits. Each duplicate processes future rewrites and queries independently.
Colored E-graphs introduce a layered structure. The base (root, "black") e-graph represents the original congruence. For each assumption (modeled by a "color"), a colored congruence is built that coarsens the base: every merge in the root automatically applies to all colored variants, but colored merges (arising from additional case-specific assumptions or equalities) are tracked and enforced locally in each colored layer. The key structural constraint is that for each color , the colored congruence is a coarsening of the root congruence, i.e., (Singher et al., 2023).
2. Data Structures, Core Operations, and Algorithms
A colored E-graph consists of:
- E-node pool representing shared subterms; hash cons table and parent map for the root.
- Root union-find maintaining .
- For each color , a union-find operating over representatives of , with color-local hash cons and parent map .
Operations are as follows:
- Insertions: Always performed on the shared root, building new e-nodes as needed.
- Root merges: unifies e-classes; all colored union-finds automatically inherit these merges.
- Colored merges: Within color , only representatives of are merged in , and the change is catalogued for that color alone.
- Rebuild/congruence closure: Two phases—global rebuild (root), followed by color-local rebuild propagating colored merges and updating colored hash cons and parent maps.
- E-matching: Patterns are considered first on the root; colored variants only require supplemental matching where colored merges yield additional equivalences not observed in the root.
The entire structure ensures single representation of each node and maximal sharing of unchanged results across case splits or assumption branches, since only the minimal set of colored merges and colored e-nodes needs to be maintained per color.
3. Complexity Analysis and Optimization
The primary asymptotic saving derives from the fact that under case splits over a root of nodes, naive cloning uses memory and time for storage and repeated rule application. Colored E-graphs require only space, with being the number of unique colored merges and colored e-nodes for color , typically far less than (Singher et al., 2023).
Optimizations include:
- Deferred colored rebuilds to amortize costs.
- Color-aware memoization for efficient lookup of composite representative pairs.
- Pruning redundant colored e-nodes after root merges collapse colored structure.
- Early colored minimization: prompt merging of colored structures to reduce layer size.
Empirical results confirm significant memory reductions: memory overhead per assumption is approximately a factor of 10 lower for optimized colored E-graphs compared to naive clones, with matching or better run-time performance on inductive proof and rewrite-heavy benchmarks.
| Approach | Memory Overhead per Assumption | Median Run-time (s) |
|---|---|---|
| Naive clones | 200 | |
| Colored (mono) | Timeout/OOM | |
| Colored (opt) | 210 |
The optimized colored layer approach always yields a substantial resource advantage.
4. Illustrative Examples and Usage Patterns
Classic examples include equality reasoning over terms such as by splitting on the logical condition and tracking assignments under separate colors. Root e-graph nodes are inserted once. Merges under each assumption (color) are performed incrementally, with the colored closure only affecting the necessary colored layer. Matching proof queries or program rewrites across all branches proceeds efficiently: a query "are and equal under ?" reduces to checking if their base representatives coincide under (Singher et al., 2023).
No entire-graph duplication is needed, and queries/rewrites that are agnostic to the case are never repeated.
5. Applications and Related Frameworks
Colored E-graphs provide improved infrastructure for:
- Satisfiability Modulo Theories (SMT) solving: branching on predicates is efficiently supported, with theory state sharing.
- Speculative program optimization: branches introduced by optimization hypotheses do not require duplicated data structures.
- Exploratory lemma synthesis and theorem proving: tools with large search trees over case splits (e.g., QuickSpec, TheSy) benefit from global sharing across assumptions.
Comparisons to other models emphasize complementary strengths. Datalog-style (egglog) e-matching supports monotonic propagation for Horn clauses but cannot model non-monotone case analysis; φ-node e-graphs represent program control flow, not case splits with inconsistent assumptions. Methods like Version-Space Algebras manage symbolic dependencies but do not natively exploit colored congruence structure (Singher et al., 2023).
6. Theoretical Context and Future Directions
The colored E-graph abstraction encodes, in a single sharing structure, all coarsenings of an initial congruence resulting from additional branch-local assumptions. This design is uniquely well-suited to problems in automated reasoning and optimization where equality reasoning under multiple scenarios is required and where many rewrites and matches are agnostic to branch.
Potential extensions include more general forms of coloring (e.g., lattice-based side conditions), further integration with SMT architecture, and application to symbolic program analysis frameworks. Fine-grained complexity analyses and detection of minimal colored substructure to optimize further sharing remain active areas of research. The colored E-graph represents a foundational structure for scalable, branch-sensitive equality reasoning in both theoretical and practical domains (Singher et al., 2023).