EGG-MCTS: E-Graph Enhanced MCTS

Updated 12 November 2025

EGG-MCTS is a method that integrates e-graph equivalence detection with MCTS to merge functionally equivalent but syntactically different derivations.
It reduces redundant exploration by lowering the effective branching factor, leading to tighter regret bounds and more efficient search.
It is applied in symbolic regression, rewrite systems, and symbolic AI, delivering superior empirical performance compared to standard MCTS approaches.

EGG-MCTS is a class of methods that enhance Monte Carlo Tree Search (MCTS) for symbolic domains by embedding e-graph (equality graph) data structures to efficiently merge and propagate information across symbolically equivalent, but syntactically distinct, derivations. Originally proposed in the context of rewrite systems and later formalized for symbolic regression, EGG-MCTS methods address redundant exploration in combinatorial, equivalence-rich environments by recognizing and leveraging symbolic equivalence throughout the MCTS process. This synergistic combination yields tighter regret bounds and superior empirical optimization results compared to standard MCTS or equality-saturation techniques alone (He et al., 2023, Jiang et al., 8 Nov 2025).

1. Motivation and Theoretical Background

Traditional MCTS explores a search tree where each node corresponds to a unique, syntactically defined candidate (e.g., a partial expression or program). In symbolic domains—such as algebraic simplification or symbolic regression—many syntactically different expressions are functionally equivalent (e.g., $\log(x_1^2x_2^3)$ , $\log(x_1^2)+\log(x_2^3)$ , and $2\log(x_1)+3\log(x_2)$ are mathematically identical). Standard MCTS fails to identify these equivalences, resulting in redundant exploration; equivalent subtrees are treated as independent, causing slow learning and high sample complexity.

EGG-MCTS, building on e-graph technology from equality saturation, embeds fast equivalence detection to dynamically merge states or update statistics jointly across equivalence classes. This merging effectively reduces the near-optimal branching factor $\kappa$ in planning and search, with the realized effective branching factor $\kappa_\infty \le \kappa$ . The tighter branching factor yields asymptotically improved regret bounds: $\mathtt{regret}_{\text{egg}}(n) = \widetilde O\left(n^{-\frac{\log(1/\gamma)}{\log \kappa_\infty}}\right), \quad \kappa_\infty \leq \kappa,$ where $\gamma$ is the MDP discount (Jiang et al., 8 Nov 2025).

2. Algorithmic Structure and Methodology

EGG-MCTS integrates the classical phases of MCTS—selection, expansion, simulation, and backpropagation—with e-graph-based equivalence handling.

State Representation and Action Space:
- States $S_t$ are typically defined by sequences of applied grammar or rewrite rules generating partial expressions.
- Action space $A(S_t)$ is the set of rewrite or grammar rules applicable at $S_t$ (pruned per state for inapplicable actions).
Transition Model:
- The transition $f(S_t, a_{t+1})$ applies action $a_{t+1}$ to state $S_t$ , resulting in a new expression or e-graph node.
Reward Function:
- Standard reward in rewrite optimization: $R = \max(\text{init\_cost} - \text{current\_cost}, 0)$ , with costs based on a user-defined objective (e.g., expression length or NMSE in regression).
- In symbolic regression, the reward after a rollout is $r = 1/(1 + \text{NMSE}(\varphi; D))$ .
Selection and Expansion:
- Follows UCT/WU-UCT selection (for parallelism): at each node $p$ , child $i$ is selected by

$U_i = \frac{W_i}{N_i} + c \sqrt{\frac{\ln N_p}{N_i}}$

with $N_i$ visit counts, $W_i$ cumulative rewards, and $c$ the exploration constant.

Simulation (Rollout):
- Rollouts use random action sequences to expand to a terminal state or reach a maximum depth.
Backpropagation with E-Graph:
- During backpropagation, an e-graph is constructed for the current derivation, undergoes saturation (by applying rewrite rules), and up to $K$ equivalent derivations are extracted.
- The reward for a rollout is then used to update visitation counts and cumulative rewards for all equivalent tree paths present in the search tree.

Summary pseudocode for EGG-MCTS backpropagation:

eg = EGraph.build(τ_path)
eg.saturate(max_iter)
EqSeqs = eg.extract_equivalent(τ_path, K)
for τ_prime in EqSeqs:
    if tree.has_path(τ_prime):
        for (s_i, a_i) along path to τ_prime:
            visits[s_i, a_i] += 1
            reward_sum[s_i, a_i] += r

3. Computational Complexity and Implementation Considerations

The primary overhead introduced by EGG-MCTS derives from e-graph construction and equivalence-class extraction:

Per-iteration complexity: If $T_0$ is the standard MCTS iteration cost (dominated by random simulation and coefficient fitting), EGG-MCTS incurs additional $O(|R| \cdot H + K \cdot H)$ per iteration for rule applications and equivalence extraction (with $|R|$ rewrite rules, $H$ max derivation length, $K$ number of extracted variants).
Resource consumption: Empirically, the overhead for e-graph operations is substantially less than 10% of the total CPU time per iteration, being dominated by rollout and coefficient fitting in symbolic regression tasks.
Parallelism: Both MCTS-GEB and EGG-MCTS support parallel expansion and simulation, yielding near-linear speedups (parallel MCTS reported as %%%%25 $S_t$ 26%%%% faster than vanilla MCTS for recognized domain benchmarks (He et al., 2023)).

4. Empirical Performance and Benchmarks

Quantitative results from both rewrite optimization and symbolic regression domains demonstrate the practical benefits of EGG-MCTS:

Domain	Main Metric	Standard MCTS	EGG-MCTS
Math (rewrite, [2303])	Min. extracted expr. length	Baseline	1.37 $\times$ reduction
	Optimization time	$\sim$ 400s	$\sim$ 400s, 6 $\times$ speedup over vanilla
Prop (rewrite, [2303])	Min. expr. length improvement	Up to 49 $\times$
Trig regression ([2511])	Median Top-10 NMSE (3,2,2)	0.033	$<10^{-6}$
	Tree size after 150 iterations	25k nodes	40k nodes (more diversity)
	NMSE on Feynman dataset (Top-10)	Baseline	$1$–$2$ orders lower

EGG-MCTS matches or substantially outperforms standard methods in both solution quality and convergence speed (in terms of roll-outs or iterations). Per-benchmark heatmaps further reveal that EGG-MCTS strategically focuses rewrites or expansions on more promising rules, in contrast to the uniform sweep of classical equality saturation.

5. Regret Analysis and Theoretical Guarantees

The theoretical analysis formalizes the effect of equivalence-based merging on search complexity. In a finite-horizon MDP setting:

Difficulty measure: Standard MCTS has near-optimal branching factor $\kappa$ ,

$\kappa = \limsup_{h \to \infty} |T_h^\infty|^{1/h}, \quad T_h^\infty = \{ a \in \mathcal A^h : V^* - V(a) \le \gamma^h/(1-\gamma) \}.$

EGG-MCTS replaces $\kappa$ with a reduced branching factor $\kappa_\infty$ , reflecting the redundancy elimination by merging.

Regret improvement:

$\mathtt{regret}_{\mathrm{egg}}(n) = \widetilde O\left(n^{-\log(1/\gamma)/\log \kappa_\infty}\right), \quad \kappa_\infty \leq \kappa.$

Thus, for equivalence-rich domains, the effective distinct search paths shrink, directly tightening the upper bound on sample complexity (Jiang et al., 8 Nov 2025).

6. Application Domains and Extensions

EGG-MCTS is applicable wherever symbolic search spaces admit rich equivalence relations expressible as rewrite rules. This includes:

Rewrite systems: Accelerating and refining equality-saturation methods by planning optimal e-graph construction policies beyond simple rule sweeping (He et al., 2023).
Symbolic regression: Pruning the search tree in regression problems to effectively avoid redundant rediscovery of equivalent fits, thereby improving both fit quality (e.g., lower NMSE) and sample efficiency (Jiang et al., 8 Nov 2025).
Other symbolic AI domains: Any MDP or planning task where state equivalence is nontrivial (e.g., logic synthesis, program induction, formal verification).

A plausible implication is that further integration with DRL and LLM–guided methods (as in EGG-SR) can yield additional performance gains by propagating equivalence information into learning signals and feedback prompts.

7. Limitations and Practical Considerations

While EGG-MCTS significantly addresses redundancy and inefficiency in combinatorial search, its effectiveness is contingent on:

Availability and completeness of rewrite systems: The ability to merge equivalents fundamentally relies on the expressive power and coverage of the underlying rewrite rule set. Incomplete systems may leave equivalence undetected.
E-graph overheads: Although practical measurements show negligible (<10%) e-graph saturation/extraction overhead for typical configurations ( $H \leq 20$ , $K \leq 10$ ), very large or complex grammars or excessively high $K$ may affect scalability.
Hyperparameter sensitivity: Key parameters such as rollout budget, node limits, and exploration constants remain problem-dependent, but were found robust in published experiments.

In summary, EGG-MCTS generalizes and unifies advances in symbolic equivalence management within MCTS, leading to principled and empirically validated gains in diverse symbolic optimization and learning domains (He et al., 2023, Jiang et al., 8 Nov 2025).

PDF Markdown Chat (Pro)

References (2)

MCTS-GEB: Monte Carlo Tree Search is a Good E-graph Builder (2023)

EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph (2025)

Follow Topic

Get notified by email when new papers are published related to EGG-MCTS.