Papers
Topics
Authors
Recent
Search
2000 character limit reached

EvoSLD: Automated Scaling Law Discovery

Updated 28 February 2026
  • EvoSLD is an automated framework that uses evolutionary algorithms and LLM guidance to evolve parametric symbolic scaling laws from grouped deep learning data.
  • It co-evolves symbolic expression templates and group-specific optimization subroutines, achieving superior NMSE compared to fixed and human-derived methods.
  • Its interpretable, compact law forms generalize across various scaling regimes, significantly reducing the traditional trial-and-error in scientific discovery.

EvoSLD is an automated framework for neural Scaling Law Discovery (SLD) that integrates evolutionary algorithms with LLM guidance to simultaneously evolve parametric symbolic law forms and their optimization strategies. Designed for research on scaling trends across deep learning systems, EvoSLD generalizes over diverse experimental regimes, grouped data structures, and model classes, yielding interpretable, compact law forms with strong empirical predictive performance. The framework addresses the challenge of automated scientific discovery traditionally reliant on significant human trial-and-error in scaling law formulation and curve fitting (Lin et al., 27 Jul 2025).

1. Problem Formulation and Objective

EvoSLD operates on datasets of the form D={(xi,yi,ci)}i=1m\mathcal D = \{(x_i, y_i, c_i)\}_{i=1}^m, where xiRnx_i \in \mathbb R^n denote nn scaling variables (such as model size, dataset size), yiRy_i \in \mathbb R is the response metric (e.g., test loss), and ciCc_i \in \mathcal C parameterizes the control variable group (e.g., task, architecture). The goal is to discover a single symbolic expression f(x;θ)f(x;\theta)—parameterized by coefficients θ\theta—such that, for each control group cCc \in \mathcal C, there exists a group-specific coefficient set θc\theta_c yielding minimal fitting error across partitions {Dc}cC\{\mathcal D_c\}_{c\in\mathcal C}.

The core SLD objective optimized by EvoSLD is:

f=argminfF{cCminθcL(f(;θc);Dc)+λΩ(f)},f^* = \arg\min_{f\in\mathcal F} \left\{ \sum_{c\in\mathcal C} \min_{\theta_c} \mathcal L(f(\cdot; \theta_c); \mathcal D_c) + \lambda \Omega(f) \right\},

where F\mathcal F is the space of candidate symbolic expressions, L\mathcal L is a group-wise fitting loss (group-normalized mean squared error, NMSE), and Ω(f)\Omega(f) is a parsimony constraint penalizing the number of free coefficients in ff. Predictive quality is assessed via group NMSE and normalized mean/absolute error metrics on held-out test sets.

2. The EvoSLD Algorithm: Co-Evolutionary Workflow

EvoSLD implements evolutionary search over code subroutines, co-evolving two critical modules:

  • Expression subroutine: Encodes the symbolic law template f(x;θ)f(x;\theta), with free (to-refit) parameters θ\theta.
  • Optimization subroutine: Programmatic logic that partitions D\mathcal D into groups, optimizes θc\theta_c for each control group, and outputs the total group-normalized loss.

The workflow encompasses:

  1. Initialization: A population of candidate (expression, optimization) pairs is seeded with simple “naive power-law” forms (e.g., f(x)=ikixiαi+Cf(x) = \sum_i k_i x_i^{\alpha_i} + C) and standard optimizers (BFGS).
  2. Evolutionary Loop (typ. 50 generations with multi-island, migration):
    • Selection: High-fitness parent pairs are drawn based on group-NMSE.
    • LLM-Guided Mutation: An LLM is prompted to generate either (i) symbolic mutations to expressions (e.g., power law \rightarrow broken power law), or (ii) tweaks to the optimizer (e.g., update strategies, initialization).
    • Evaluation: Resulting child pairs are run, refitted per group, and scored by held-out NMSE.
    • Database Update: The most competitive pairs are retained.
  3. Termination: The top-scoring symbolic expression ff^* is selected.

Co-evolving both the law structure and group-wise optimizer modules is essential for robust coefficient estimation, especially under sparse, heterogeneous, or multi-group data regimes.

3. Scaling Law Formulations and Search Space

While EvoSLD’s hypothesis space supports highly expressive forms (sums/products of powers, exponentials, harmonics), the majority of discovered scaling laws conform to physically interpretable templates such as:

y=axb+cory=i=1nAixiαi+C,y = a x^b + c \quad\text{or}\quad y = \sum_{i=1}^n A_i x_i^{\alpha_i} + C,

where xix_i are the scaling variables, AiA_i, αi\alpha_i are group-specific or group-shared coefficients (θc\theta_c), and CC is typically interpreted as an irreducible error floor. The law space F\mathcal F is explicitly hard-bounded by a maximum number of coefficients (τ\tau), enforcing parsimony and mitigating overfitting.

4. Experimental Scenarios and Baselines

EvoSLD was validated across five real-world SLD scenarios, each drawn from contemporary scaling law literature:

Scenario Controls Example Law Form Coefficient Cap (τ\tau)
Vocabulary Size None L(N,V,D)L(N,V,D) vs NN, VV, DD; \sum powers $7$
Supervised Fine-Tuning (SFT) Architecture ×\times Task L(D)=kDα+AL(D) = k D^\alpha + A $4$
Domain Mixture Model Size \sum per-domain powers M(M+2)M(M+2)
Mixture-of-Experts (MoE) None L(N,E)=k1Nα1+k2Eα2+CL(N,E) = k_1N^{\alpha_1} + k_2E^{\alpha_2} + C $6$
Data-Constrained Pretraining None Three scaling axes NN, DD, UU $7$

The dataset splits were group- or random-based, with held-out scales reserved for final evaluation. Baselines included fixed power-law SLD, symbolic regression (PySR, GPlearn), EvoSLD with a fixed optimizer, and published human-derived laws with coefficients re-fit under EvoSLD’s grouped optimizer.

5. Empirical Performance and Case Studies

On all scenarios except Data-Constrained (where it ranked second in NMAE), EvoSLD achieved the best NMSE/NMAE on held-out sets. In the Vocabulary and SFT tasks, EvoSLD exactly rediscovered the published law forms. In others, it surpassed human and symbolic regression baselines, reducing NMSE by orders of magnitude. Notably, optimally-distilled expressions required significantly fewer coefficients (e.g., only $4$ of $7$ allowed in MoE).

Notable Results

  • Vocabulary Size: Exact recovery of L(N,V,D)=ANα+BVβ+CDγ+EL(N,V,D) = \frac{A}{N^\alpha} + \frac{B}{V^\beta} + \frac{C}{D^\gamma} + E (identical to Tao et al. 2024).
  • SFT: Exact recovery of L(D)=ADα+B+CL(D) = \frac{A}{D^\alpha + B} + C.
  • Domain Mixture: EvoSLD discovered Li(r)=ai+j=1MbijrjβijL_i(\mathbf r) = a_i + \sum_{j=1}^M b_{ij} r_j^{\beta_{ij}}, obtaining NMSE =0.0007= 0.0007 compared to $0.0669$ for the prior law.
  • Data-Constrained: Identified a compact three-term law L(N,D,U)=L+A1Nα1+A2Dα2+A3Uα3L(N,D,U) = L_\infty + A_1 N^{\alpha_1} + A_2 D^{\alpha_2} + A_3 U^{\alpha_3}, with lower NMSE than the two-term human baseline.
  • Mixture-of-Experts: Produced L(N,E)=L+k1Nα1+k2Eα2L(N,E) = L_\infty + k_1 N^{\alpha_1} + k_2 E^{\alpha_2}, requiring only $4$ coefficients and achieving superior NMSE.

Conventional symbolic regression consistently failed or could not handle grouped/control-variable settings, and EvoSLD ablated to fixed optimizers exhibited 510×5-10\times higher NMSE, highlighting the necessity of optimizer co-evolution.

6. Interpretability, Efficiency, and Limitations

EvoSLD directly enforces parsimony via hard coefficient caps and LLM prior-guidance toward canonical operators (powers, exponentials). Discovered scaling laws are typically succinct, aligning with prevailing notions of scientific simplicity and physical plausibility.

A full EvoSLD search, including all code subroutine mutations and cross-validation, can be completed in minutes on commodity hardware, contrasting with the manual week-long analysis required by domain experts. Despite this, EvoSLD’s reliance on static, published datasets limits its ability to select new experimental points, making robust coefficient fitting challenging amid sparse group data. Independent evolutionary runs tend to yield diverse symbolic forms, suggesting that the system’s synthesis is genuine and not simply a product of LLM pretraining exposures.

Planned future directions include expanding EvoSLD to active agentic modes: proposing new experiments, executing autonomous data collection in sandboxed environments, and conducting formal statistical evaluations.

7. Summary and Broader Implications

EvoSLD constitutes the first evolutionary, LLM-guided framework (as of 2025) for scaling law discovery that (i) formalizes SLD with grouped fits and strict parsimony, (ii) co-evolves both symbolic law forms and their optimizing algorithms, and (iii) empirically matches or surpasses established laws in challenging deep learning scaling scenarios (Lin et al., 27 Jul 2025). This suggests automated SLD may substantially reduce manual escalation curves in future AI systems research, provided ongoing challenges in experiment generation and model selection are addressed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EvoSLD.