Papers
Topics
Authors
Recent
Search
2000 character limit reached

Colored E-Graphs

Updated 13 January 2026
  • Colored E-graphs are generalized data structures that encode multiple coarsened congruence relations within a unified framework.
  • They employ a layered union-find structure to share base merges while tracking local colored merges under case-specific assumptions.
  • Optimized colored E-graphs significantly lower memory overhead and improve runtime performance in SMT solving, program optimization, and theorem proving.

A colored E-graph is a generalized data structure that encodes multiple coarsened congruence relations, or multiple "possible worlds," in a single e-graph-like representation. Originally developed to improve the efficiency of equality reasoning with case splits in formal logic and program analysis, colored E-graphs achieve substantial asymptotic and practical resource savings by sharing structure across variants induced by different assumptions (Singher et al., 2023).

1. Model: Standard and Colored E-Graphs

A standard e-graph represents congruence over a set of ground terms LL built from a signature Σ\Sigma. This is achieved by maintaining equivalence classes (e-classes) over subterms and supporting congruence closure under function application. Conventional e-graphs are effective for equality saturation under unconditional rewrite rules: repeated application of universal equalities until closure.

Conditional rewrite rules (for example, x>y  ⟹  max(x,y)→xx > y \implies \mathsf{max}(x, y) \to x) compel one to consider multiple, mutually inconsistent sets of assumptions or "branches". Traditionally, this is handled by duplicating the entire e-graph per case, leading to O(K)O(K) time and memory blow-up for KK case splits. Each duplicate processes future rewrites and queries independently.

Colored E-graphs introduce a layered structure. The base (root, "black") e-graph represents the original congruence. For each assumption (modeled by a "color"), a colored congruence is built that coarsens the base: every merge in the root automatically applies to all colored variants, but colored merges (arising from additional case-specific assumptions or equalities) are tracked and enforced locally in each colored layer. The key structural constraint is that for each color cc, the colored congruence ≅c\cong_c is a coarsening of the root congruence, i.e., ≅0⊆≅c\cong_0 \subseteq \cong_c (Singher et al., 2023).

2. Data Structures, Core Operations, and Algorithms

A colored E-graph C\mathcal{C} consists of:

  • E-node pool EE representing shared subterms; hash cons table H0H_0 and parent map P0P_0 for the root.
  • Root union-find UF0UF_0 maintaining ≅0\cong_0.
  • For each color cc, a union-find UFcUF_c operating over representatives of UF0UF_0, with color-local hash cons HcH_c and parent map PcP_c.

Operations are as follows:

  • Insertions: Always performed on the shared root, building new e-nodes as needed.
  • Root merges: UF0UF_0 unifies e-classes; all colored union-finds automatically inherit these merges.
  • Colored merges: Within color cc, only representatives of UF0UF_0 are merged in UFcUF_c, and the change is catalogued for that color alone.
  • Rebuild/congruence closure: Two phases—global rebuild (root), followed by color-local rebuild propagating colored merges and updating colored hash cons and parent maps.
  • E-matching: Patterns are considered first on the root; colored variants only require supplemental matching where colored merges yield additional equivalences not observed in the root.

The entire structure ensures single representation of each node and maximal sharing of unchanged results across case splits or assumption branches, since only the minimal set of colored merges and colored e-nodes needs to be maintained per color.

3. Complexity Analysis and Optimization

The primary asymptotic saving derives from the fact that under KK case splits over a root of NN nodes, naive cloning uses O(KN)O(KN) memory and time for storage and repeated rule application. Colored E-graphs require only O(N+∑cΔc)O(N + \sum_c \Delta_c) space, with Δc\Delta_c being the number of unique colored merges and colored e-nodes for color cc, typically far less than NN (Singher et al., 2023).

Optimizations include:

  • Deferred colored rebuilds to amortize costs.
  • Color-aware memoization for efficient lookup of composite representative pairs.
  • Pruning redundant colored e-nodes after root merges collapse colored structure.
  • Early colored minimization: prompt merging of colored structures to reduce layer size.

Empirical results confirm significant memory reductions: memory overhead per assumption is approximately a factor of 10 lower for optimized colored E-graphs compared to naive clones, with matching or better run-time performance on inductive proof and rewrite-heavy benchmarks.

Approach Memory Overhead per Assumption Median Run-time (s)
Naive clones 10410^4 200
Colored (mono) 1.2×1031.2 \times 10^3 Timeout/OOM
Colored (opt) 1.1×1021.1 \times 10^2 210

The optimized colored layer approach always yields a substantial resource advantage.

4. Illustrative Examples and Usage Patterns

Classic examples include equality reasoning over terms such as max(x,y)−min(x,y)≡∣x−y∣\mathsf{max}(x, y) - \mathsf{min}(x, y) \equiv |x-y| by splitting on the logical condition x<yx < y and tracking assignments under separate colors. Root e-graph nodes are inserted once. Merges under each assumption (color) are performed incrementally, with the colored closure only affecting the necessary colored layer. Matching proof queries or program rewrites across all branches proceeds efficiently: a query "are AA and BB equal under cc?" reduces to checking if their base representatives coincide under UFcUF_c (Singher et al., 2023).

No entire-graph duplication is needed, and queries/rewrites that are agnostic to the case are never repeated.

Colored E-graphs provide improved infrastructure for:

  • Satisfiability Modulo Theories (SMT) solving: branching on predicates is efficiently supported, with theory state sharing.
  • Speculative program optimization: branches introduced by optimization hypotheses do not require duplicated data structures.
  • Exploratory lemma synthesis and theorem proving: tools with large search trees over case splits (e.g., QuickSpec, TheSy) benefit from global sharing across assumptions.

Comparisons to other models emphasize complementary strengths. Datalog-style (egglog) e-matching supports monotonic propagation for Horn clauses but cannot model non-monotone case analysis; φ-node e-graphs represent program control flow, not case splits with inconsistent assumptions. Methods like Version-Space Algebras manage symbolic dependencies but do not natively exploit colored congruence structure (Singher et al., 2023).

6. Theoretical Context and Future Directions

The colored E-graph abstraction encodes, in a single sharing structure, all coarsenings of an initial congruence resulting from additional branch-local assumptions. This design is uniquely well-suited to problems in automated reasoning and optimization where equality reasoning under multiple scenarios is required and where many rewrites and matches are agnostic to branch.

Potential extensions include more general forms of coloring (e.g., lattice-based side conditions), further integration with SMT architecture, and application to symbolic program analysis frameworks. Fine-grained complexity analyses and detection of minimal colored substructure to optimize further sharing remain active areas of research. The colored E-graph represents a foundational structure for scalable, branch-sensitive equality reasoning in both theoretical and practical domains (Singher et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Colored E-Graphs.