Papers
Topics
Authors
Recent
2000 character limit reached

Annealed SMC Samplers for Global Optimization

Updated 9 November 2025
  • The paper demonstrates that annealed sequential Monte Carlo samplers systematically enumerate candidate models up to a fixed complexity, ensuring global optimality.
  • It integrates an information-theoretic MDL-based model selection with robust, parallelized parameter fitting to overcome limitations of traditional stochastic approaches.
  • Empirical results show that these samplers yield interpretable, parsimonious models that outperform conventional genetic programming in both accuracy and efficiency.

Exhaustive Symbolic Regression (ESR) is a deterministic, globally optimal framework for symbolic regression that systematically enumerates all possible analytic expressions up to a prescribed complexity threshold and ranks them using the minimum description length (MDL) principle. ESR addresses two central limitations of conventional stochastic approaches—namely, the risk of missing optimal solutions and the ambiguity in function-selection criteria—by integrating exhaustive combinatorial search and information-theoretic model selection. The methodology is extensible to diverse operator sets and scientific domains; empirical studies show superiority or complementarity to traditional genetic programming-based methods.

1. Problem Formalization and Exhaustive Search Structure

In symbolic regression, the objective is to recover an analytic function f(x;θ)f(x;\theta) from a finite dataset D={(xi,yi,σi)}i=1ND = \{(x_i, y_i, \sigma_i)\}_{i=1}^N, where yiy_i are observed targets, σi\sigma_i measurement uncertainties, and ff is drawn from expressions constructed with operators and variables from a predefined basis O\mathcal{O}. ESR imposes a hard complexity budget KK (max tree nodes), and systematically enumerates every possible expression tree up to size KK, considering all labelings of internal nodes with operators of appropriate arity.

Unique expressions are identified at two levels:

  • Structural deduplication: Only tree shapes that yield valid, acyclic computation graphs are retained.
  • Algebraic equivalence: For each candidate, aggressive symbolic simplification (commutativity, distributivity, parametric redefinition, and canonicalization) is employed to collapse all algebraically equivalent forms to a canonical representative; parameter permutations and integer constant simplification are incorporated.

For each unique expression, free parameters θ\theta are optimized using numerical likelihood maximization, and the full list of fitted models is ranked by a composite description-length objective.

2. Minimum Description Length (MDL) Model Selection

The MDL score L(D;f)L(D; f) for a candidate f(x;θ)f(x;\theta) is formulated in nats as: L(D;f)=logL(θ^)+klogn+jlogcjp2log3+i=1p[12logIii(θ^)+logθ^i]L(D; f) = -\log \mathcal{L}(\hat\theta) + k \log n + \sum_j \log c_j - \frac{p}{2} \log 3 + \sum_{i=1}^p \left[ \frac{1}{2}\log I_{ii}(\hat\theta) + \log|\hat\theta_i| \right] where:

  • logL(θ^)-\log \mathcal{L}(\hat\theta): minus log-likelihood for the data under θ^\hat\theta (maximum-likelihood parameter fit); under Gaussian noise,

logL(θ^)=12i[(yif(xi;θ^))2σi2+log(2πσi2)]-\log \mathcal{L}(\hat\theta) = \frac{1}{2} \sum_i \left[\frac{(y_i - f(x_i;\hat\theta))^2}{\sigma_i^2} + \log(2\pi \sigma_i^2)\right]

  • kk: number of tree nodes; nn: basis/operator set size (logn\log n per node)
  • jlogcj\sum_j \log c_j: cumulative penalty for integer constants in normalization/canonicalization
  • pp: number of free continuous parameters
  • Iii(θ^)I_{ii}(\hat\theta): observed Fisher information matrix diagonal at θ^\hat\theta (second derivatives of logL-\log \mathcal{L})
  • p2log3-\frac{p}{2}\log3: absorbing constant from parameter precision coding

This information-theoretic trade-off encodes both model complexity (structural and parametric) and data-fit (likelihood), favoring parsimonious, high-accuracy models.

3. Search and Fitting Algorithmic Workflow

The ESR pipeline follows this deterministic procedure:

  1. Tree template enumeration: Generate all operator-arity tree shapes up to complexity KK (structural enumeration).
  2. Operator assignment: For each template, exhaustively assign basis operators to tree nodes; maintain only valid, executable expression candidates.
  3. Canonicalization and deduplication: Simplify and hash expressions to collapse all algebraic duplicates; retain only the canonical form.
  4. Parameter fitting: For each unique expression, perform nonlinear optimization (e.g., BFGS) of real parameters θ\theta to maximize data likelihood; employ multi-starts for robustness against local optima.
  5. Scoring: Compute MDL score for each fitted model using the formula above.
  6. Ranking: Sort all models by increasing L(D;f)L(D; f); present the top-ranked model(s) or those within an information threshold ΔL\Delta L.

Strong pruning heuristics (early canonicalization, partial tree simplification, parallelized fitting) and pre-computation of tree templates are applied to make exhaustive search computationally feasible for K10K \lesssim 10.

4. Complexity and Computational Feasibility

The number of possible expressions before deduplication grows as O(CKnK)O(C_K n^K), where CKC_K is the KK-th Catalan number (tree shapes), and nn the operator set size. For n8n\sim8 and K=10K=10, this is >106>10^6 expressions, yet hash-based deduplication reduces the required parameter fits by up to three orders of magnitude (e.g., $5.2$M enumerated \to 120,000\sim 120,000 unique for n=8,K=10n=8, K=10).

With aggressive parallelization, empirical timing data indicate that for K=10K=10, enumeration and deduplication require 46\sim 46 minutes and parameter fitting 150\sim 150 CPU-hours on $196$ cores. Wall time scales as (1.5n)K(1.5n)^K; beyond K=11K=11 feasibility degrades rapidly.

5. Comparison with Traditional Genetic Programming Symbolic Regression

Property ESR Stochastic GP
Search completeness Deterministic, globally complete up to KK Probabilistic, may miss optimum
Parameter fitting Full numerical optimization per candidate Local or post-hoc often used
Model ranking Single scalar MDL score Pareto front, selection is ad hoc
Algebraic equivalence Canonicalization eliminates duplicates No guarantee
Computational expense High but parallelizable; bounded by KK Scalable but non-deterministic

ESR guarantees discovery of all possible functions (within KK) and a single, principled model-selection mechanism. Empirically, ESR finds the true optimal formula in toy and physical-law benchmarks where stochastic GP often returns approximate forms or misses sharp performance transitions at threshold complexity.

6. Notable Empirical Results and Applications

Astrophysical applications demonstrate ESR’s impact:

  • Cosmic expansion rate H(z)H(z): ESR+MDL selects simpler models (H2(z)=θ0(1+z)2H^2(z) = \theta_0(1+z)^2; k=5k=5) with significantly better description lengths ΔL\Delta L than the standard Λ\LambdaCDM formula (H2(z)=θ0+θ1(1+z)3H^2(z) = \theta_0 + \theta_1(1+z)^3; k=7k=7), for both cosmic-chronometer (32 points) and Pantheon+ supernova (1590 points) data. The MDL-optimal forms may suggest overparameterization or non-uniqueness in standard cosmological models for current data quality.
  • MOND radial-acceleration relation: ESR recovers hundreds of functions outperforming classic ν\nu-functions in MDL; most best-fit models do not reproduce the deep-MOND limit, indicating the data alone do not uniquely specify the functional form.
  • Inflationary potential reconstruction: ESR with a grammar prior (Katz nn-gram) recovers physically preferred functional forms (e.g., V(ϕ)=θ0[θ1+log2ϕ]V(\phi) = \theta_0 [\theta_1 + \log^2\phi]), with literature standards (Starobinsky, quadratic, quartic) ranking far lower in MDL.

Benchmarking on toy problems further demonstrates that ESR rediscovers canonical formulas (e.g., normalized Gaussian) exactly, identifying the minimal complexity required for zero-error representation.

7. Limitations, Extensions, and Significance

ESR is limited by exponential scaling in complexity and operator set size. For K>11K>11 or very large operator banks, run-time and memory costs become prohibitive. Additionally, ESR only explores the expressivity permitted by the fixed basis and grammar; unrepresented classes of expressions or highly nested forms beyond KK are inaccessible.

Potential extensions involve hierarchical or language-model-based priors for operator selection (as shown in inflationary reconstructions), hybrid ESR-GP approaches (using ESR-generated models as seeds for further stochastic search at higher complexity), and the application of ESR to alternative coding/decoding or noise models. The MDL framework could also be adapted to cross-validation or predictive criteria when measurement uncertainties are unknown.

In summary, ESR provides a definitive, reproducible, and interpretable pipeline for symbolic regression, with a mathematically grounded model-selection criterion and deterministic convergence properties. Studies demonstrate that ESR can outperform dominant heuristic algorithms in both classical and scientific symbolic discovery, offering a credible alternative for critical domains where model optimality and interpretability are paramount (Desmond, 17 Jul 2025, Bartlett et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Annealed Sequential Monte Carlo Samplers.