Annealed SMC Samplers for Global Optimization
- The paper demonstrates that annealed sequential Monte Carlo samplers systematically enumerate candidate models up to a fixed complexity, ensuring global optimality.
- It integrates an information-theoretic MDL-based model selection with robust, parallelized parameter fitting to overcome limitations of traditional stochastic approaches.
- Empirical results show that these samplers yield interpretable, parsimonious models that outperform conventional genetic programming in both accuracy and efficiency.
Exhaustive Symbolic Regression (ESR) is a deterministic, globally optimal framework for symbolic regression that systematically enumerates all possible analytic expressions up to a prescribed complexity threshold and ranks them using the minimum description length (MDL) principle. ESR addresses two central limitations of conventional stochastic approaches—namely, the risk of missing optimal solutions and the ambiguity in function-selection criteria—by integrating exhaustive combinatorial search and information-theoretic model selection. The methodology is extensible to diverse operator sets and scientific domains; empirical studies show superiority or complementarity to traditional genetic programming-based methods.
1. Problem Formalization and Exhaustive Search Structure
In symbolic regression, the objective is to recover an analytic function from a finite dataset , where are observed targets, measurement uncertainties, and is drawn from expressions constructed with operators and variables from a predefined basis . ESR imposes a hard complexity budget (max tree nodes), and systematically enumerates every possible expression tree up to size , considering all labelings of internal nodes with operators of appropriate arity.
Unique expressions are identified at two levels:
- Structural deduplication: Only tree shapes that yield valid, acyclic computation graphs are retained.
- Algebraic equivalence: For each candidate, aggressive symbolic simplification (commutativity, distributivity, parametric redefinition, and canonicalization) is employed to collapse all algebraically equivalent forms to a canonical representative; parameter permutations and integer constant simplification are incorporated.
For each unique expression, free parameters are optimized using numerical likelihood maximization, and the full list of fitted models is ranked by a composite description-length objective.
2. Minimum Description Length (MDL) Model Selection
The MDL score for a candidate is formulated in nats as: where:
- : minus log-likelihood for the data under (maximum-likelihood parameter fit); under Gaussian noise,
- : number of tree nodes; : basis/operator set size ( per node)
- : cumulative penalty for integer constants in normalization/canonicalization
- : number of free continuous parameters
- : observed Fisher information matrix diagonal at (second derivatives of )
- : absorbing constant from parameter precision coding
This information-theoretic trade-off encodes both model complexity (structural and parametric) and data-fit (likelihood), favoring parsimonious, high-accuracy models.
3. Search and Fitting Algorithmic Workflow
The ESR pipeline follows this deterministic procedure:
- Tree template enumeration: Generate all operator-arity tree shapes up to complexity (structural enumeration).
- Operator assignment: For each template, exhaustively assign basis operators to tree nodes; maintain only valid, executable expression candidates.
- Canonicalization and deduplication: Simplify and hash expressions to collapse all algebraic duplicates; retain only the canonical form.
- Parameter fitting: For each unique expression, perform nonlinear optimization (e.g., BFGS) of real parameters to maximize data likelihood; employ multi-starts for robustness against local optima.
- Scoring: Compute MDL score for each fitted model using the formula above.
- Ranking: Sort all models by increasing ; present the top-ranked model(s) or those within an information threshold .
Strong pruning heuristics (early canonicalization, partial tree simplification, parallelized fitting) and pre-computation of tree templates are applied to make exhaustive search computationally feasible for .
4. Complexity and Computational Feasibility
The number of possible expressions before deduplication grows as , where is the -th Catalan number (tree shapes), and the operator set size. For and , this is expressions, yet hash-based deduplication reduces the required parameter fits by up to three orders of magnitude (e.g., $5.2$M enumerated unique for ).
With aggressive parallelization, empirical timing data indicate that for , enumeration and deduplication require minutes and parameter fitting CPU-hours on $196$ cores. Wall time scales as ; beyond feasibility degrades rapidly.
5. Comparison with Traditional Genetic Programming Symbolic Regression
| Property | ESR | Stochastic GP |
|---|---|---|
| Search completeness | Deterministic, globally complete up to | Probabilistic, may miss optimum |
| Parameter fitting | Full numerical optimization per candidate | Local or post-hoc often used |
| Model ranking | Single scalar MDL score | Pareto front, selection is ad hoc |
| Algebraic equivalence | Canonicalization eliminates duplicates | No guarantee |
| Computational expense | High but parallelizable; bounded by | Scalable but non-deterministic |
ESR guarantees discovery of all possible functions (within ) and a single, principled model-selection mechanism. Empirically, ESR finds the true optimal formula in toy and physical-law benchmarks where stochastic GP often returns approximate forms or misses sharp performance transitions at threshold complexity.
6. Notable Empirical Results and Applications
Astrophysical applications demonstrate ESR’s impact:
- Cosmic expansion rate : ESR+MDL selects simpler models (; ) with significantly better description lengths than the standard CDM formula (; ), for both cosmic-chronometer (32 points) and Pantheon+ supernova (1590 points) data. The MDL-optimal forms may suggest overparameterization or non-uniqueness in standard cosmological models for current data quality.
- MOND radial-acceleration relation: ESR recovers hundreds of functions outperforming classic -functions in MDL; most best-fit models do not reproduce the deep-MOND limit, indicating the data alone do not uniquely specify the functional form.
- Inflationary potential reconstruction: ESR with a grammar prior (Katz -gram) recovers physically preferred functional forms (e.g., ), with literature standards (Starobinsky, quadratic, quartic) ranking far lower in MDL.
Benchmarking on toy problems further demonstrates that ESR rediscovers canonical formulas (e.g., normalized Gaussian) exactly, identifying the minimal complexity required for zero-error representation.
7. Limitations, Extensions, and Significance
ESR is limited by exponential scaling in complexity and operator set size. For or very large operator banks, run-time and memory costs become prohibitive. Additionally, ESR only explores the expressivity permitted by the fixed basis and grammar; unrepresented classes of expressions or highly nested forms beyond are inaccessible.
Potential extensions involve hierarchical or language-model-based priors for operator selection (as shown in inflationary reconstructions), hybrid ESR-GP approaches (using ESR-generated models as seeds for further stochastic search at higher complexity), and the application of ESR to alternative coding/decoding or noise models. The MDL framework could also be adapted to cross-validation or predictive criteria when measurement uncertainties are unknown.
In summary, ESR provides a definitive, reproducible, and interpretable pipeline for symbolic regression, with a mathematically grounded model-selection criterion and deterministic convergence properties. Studies demonstrate that ESR can outperform dominant heuristic algorithms in both classical and scientific symbolic discovery, offering a credible alternative for critical domains where model optimality and interpretability are paramount (Desmond, 17 Jul 2025, Bartlett et al., 2022).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free