Exhaustive Symbolic Regression (ESR)

Updated 18 December 2025

Exhaustive Symbolic Regression (ESR) is a deterministic framework that systematically enumerates analytic expressions to guarantee the discovery of a globally optimal function.
It employs exhaustive tree generation combined with semantic deduplication to prune redundant expressions and improve search efficiency.
ESR integrates parameter fitting and MDL-based model selection, making it effective for moderate-sized problems in fields like astrophysics and engineering.

Exhaustive Symbolic Regression (ESR) is a class of deterministic algorithms for symbolic regression (SR) that systematically enumerates and evaluates all possible analytic expressions—constructed from a user-specified operator set and up to a defined maximum structural complexity—to identify the globally optimal function (modulo parameter fitting) for a given dataset. ESR is contrasted with traditional heuristic or stochastic SR approaches such as genetic programming (GP): whereas GP explores the search space probabilistically and may miss the global minimum or redundantly evaluate semantically equivalent models, ESR guarantees completeness and optimality within the constrained function space by explicit search and semantic deduplication (Kronberger et al., 26 Apr 2024, Bartlett et al., 2022, Desmond, 17 Jul 2025). The exponential growth of the search space with expression size imposes practical limitations, but advances in equivalence class reduction, parallelization, and minimum description length (MDL) model selection have rendered ESR competitive and informative for moderate problem sizes and in high-interpretability settings.

1. Formal Definition and Theoretical Properties

ESR defines its hypothesis space by a fixed context-free grammar $G$ comprising:

A finite set of nullary, unary, and binary operators: variables, numeric parameter placeholders, and mathematical operations (e.g., $+$ , $-$ , $\times$ , $\div$ , $\exp$ , $\log$ , $\sin$ , $\mathrm{inv}$ , $\mathrm{pow}$ , etc.).
A user-specified maximum expression complexity: typically measured by total node count $L$ in the expression tree (or, alternatively, by DAG node count).
The exhaustive enumeration of all syntactically valid expressions generated from $G$ with size $\leq L$ , including all permutations of operator assignments to tree templates.

The primary goal of ESR is to find the globally optimal symbolic function $f(x; \theta)$ , with fitted parameters $\theta$ , that best meets a chosen loss or likelihood objective (e.g., mean-squared error, negative log-likelihood, or a code-length criterion) over the provided data, subject to the expressivity bounds of $G$ and $L$ (Kronberger et al., 26 Apr 2024, Bartlett et al., 2022, Desmond, 17 Jul 2025).

Symbolic regression in general is known to be NP-hard, even when restricted to simple grammars and loss functions, implying that any exhaustive search algorithm, including ESR, must contend with exponential worst-case scaling in $L$ or the number of primitives (Virgolin et al., 2022). ESR is therefore tractable only for expressions of modest size or under highly restricted grammars.

2. Enumeration Algorithm and Search Space Reduction

The ESR workflow comprises three core phases:

Tree Generation (Enumeration):
- Generate all rooted, ordered tree (or DAG) templates up to the complexity limit. Each tree encodes the arity structure (commuting, associating, or otherwise) of an expression (Bartlett et al., 2022, Desmond, 17 Jul 2025, Kammerer et al., 2021).
- For each template, exhaustively assign operator symbols and leaf roles (variable or parameter) to yield all raw expressions.
Semantic Deduplication (Equality Saturation):
- Many distinct syntax trees are functionally congruent. ESR applies algebraic simplification rules—commutativity, associativity, distributivity, constant folding, parameter reparametrization, and custom operator-specific rewrites—so that all semantically equivalent expressions map to a canonical form (Kronberger et al., 26 Apr 2024, Bartlett et al., 2022, Desmond, 17 Jul 2025, Kammerer et al., 2021).
- Equality saturation (eq-sat) leverages e-graph structures to efficiently collapse congruent trees and extract unique representatives (Kronberger et al., 26 Apr 2024).
- Hashing and memoization are used to avoid redundant fitting and evaluation of duplicates.
Parameter Fitting:
- For each unique canonical expression, perform nonlinear or convex optimization to fit continuous parameters (e.g., using L-BFGS, BFGS, or Levenberg–Marquardt), often with multiple random restarts to improve global fidelity (Bartlett et al., 2022, Kronberger et al., 26 Apr 2024, Kammerer et al., 2021).

The reduction in unique expressions after semantic deduplication is substantial. For example, for a grammar yielding $\sim 0.053\exp(1.82L)$ raw trees for complexity $L$ , equality saturation typically reduces this to $\sim 0.18\exp(1.30L)$ semantically unique expressions (Kronberger et al., 26 Apr 2024).

3. Model Evaluation and the Minimum Description Length Principle

While ESR can be used with any loss function, modern implementations adopt the minimum description length (MDL) principle for model selection and ranking (Bartlett et al., 2022, Desmond, 17 Jul 2025, Martín et al., 28 Nov 2025). The MDL for an expression $f$ and data $D$ is given as

$L(D) = -\log \mathcal{L}(\hat\theta) + k \log n + \sum_j \log c_j + \sum_{i=1}^p \left[\tfrac{1}{2}\log I_{ii} + \log |\hat\theta_i|\right] - \frac{p}{2}\log3$

where:

$-\log \mathcal{L}(\hat\theta)$ is the negative log-likelihood at best-fit parameters,
$k$ is the number of nodes in the expression tree,
$n$ is the number of available operators,
$c_j$ are integer constants appearing in the formula,
$p$ is the number of real-valued parameters,
$I_{ii}$ is the observed Fisher information for $\theta_i$ at the optimum,
all terms are in nats.

MDL penalizes both complexity (number of nodes, parameters, and their required precision) and the lack of fit, establishing an information-theoretic trade-off that collapses the standard Pareto frontier into a unique scalar ranking (Bartlett et al., 2022, Desmond, 17 Jul 2025, Martín et al., 28 Nov 2025). ESR ensures that the lowest- $L(D)$ expression is globally optimal within the exhaustively searched space.

4. Computational Complexity and Scalability

The principal bottleneck in ESR is combinatorial growth of the candidate space. The number of possible expression trees for a reasonable operator set grows exponentially with complexity threshold. For operator set size $n$ and maximum complexity $k$ , the raw labeling count scales as $n^{k}$ ; pruning to valid templates and further semantic simplification reduces, but does not eliminate, exponential scaling (Bartlett et al., 2022, Desmond, 17 Jul 2025, Kronberger et al., 26 Apr 2024). For instance, with $k=10$ and 8 operators, enumeration yields $\sim 5.2$ million valid trees and $\sim 1.3 \times 10^5$ unique expressions (Bartlett et al., 2022).

Parallelization, caching, precomputed libraries of simplified expressions, and staged structural pruning (e.g., application of Pareto fronts in loss–complexity space) are employed for tractability (Kahlmeyer et al., 24 Jun 2025, Desmond, 17 Jul 2025). Complete ESR remains tractable only up to $k \sim 10–12$ for modest grammars on cluster-scale resources; beyond this, either sampling-based or guided search, or strong grammar constraints, are necessary.

5. Empirical Performance and Applications

Empirical studies demonstrate that, for typical complexity thresholds and noise levels, ESR can outperform both GP and the state-of-the-art neural and heuristic symbolic regressors in accuracy, noise robustness, and recovery of ground-truth expressions (Kronberger et al., 26 Apr 2024, Kahlmeyer et al., 24 Jun 2025, Bartlett et al., 2022). Notable application domains include:

Astrophysics: ESR has been employed to analyze cosmic expansion rate data and galaxy acceleration laws, producing expressions with lower MDL than standard theoretical models, leading to informative comparisons with cosmological paradigms (Desmond, 17 Jul 2025, Bartlett et al., 2022).
Physics and Engineering Benchmarks: ESR achieves exact or near-exact solutions for simple noiseless tasks, and exhibits high recovery rates and interpretability in real-world system identification (Kammerer et al., 2021, Kahlmeyer et al., 24 Jun 2025).
Dark Matter Halos: ESR recovers canonical NFW profiles at low noise, but prefers simpler models as data quality degrades, effectively quantifying structural uncertainty imposed by observational error (Martín et al., 28 Nov 2025).
Comparisons with GP: ESR consistently finds the global optimum (within the grammar), while GP repeatedly evaluates redundant or equivalent structures, yielding only $10–40\%$ novel expressions and failing to reach global optima in exhaustive spaces (Kronberger et al., 26 Apr 2024).

6. Limitations, Extensions, and Open Challenges

The chief limitation of ESR is exponential computational cost with increasing expression size, operator set richness, or problem dimensionality. Redundancy elimination, grammar restriction, DAG representations, and hybrid guided–exhaustive search offer partial mitigation, but scaling to deep or highly multivariate expressions remains a hard barrier (Virgolin et al., 2022, Kahlmeyer et al., 24 Jun 2025, Kammerer et al., 2021).

Open research challenges include:

On-the-fly semantic deduplication or equality saturation in non-exhaustive SR procedures.
Domain-specific priors for structure (e.g., Katz LLMs or domain-theory knowledge) to regularize or restrict the ESR search space.
Integration of ESR with differentiable or learning-based proposal mechanisms for partially guided exhaustive search (Desmond, 17 Jul 2025, Kahlmeyer et al., 24 Jun 2025).
Quantifying the irreducible redundancy in SR search spaces at high complexity or variable count.

7. Impact and Outlook

ESR, coupled with principled model selection, provides a rigorous and transparent framework for uncovering analytic laws directly from finite data, automatically and quantitatively balancing accuracy and simplicity. Its completeness and deterministic nature render it valuable for high-stakes scientific and engineering applications requiring reproducible, interpretable, and globally optimal solutions. Ongoing advances in semantic deduplication algorithms, efficient enumeration, and hybridization with learning-based strategies continue to push the tractable frontiers of exhaustive SR, with ESR positioned as a central reference methodology for the discipline (Kronberger et al., 26 Apr 2024, Bartlett et al., 2022, Desmond, 17 Jul 2025).