EvoSLD: Automated Scaling Law Discovery
- EvoSLD is an automated framework that uses evolutionary algorithms and LLM guidance to evolve parametric symbolic scaling laws from grouped deep learning data.
- It co-evolves symbolic expression templates and group-specific optimization subroutines, achieving superior NMSE compared to fixed and human-derived methods.
- Its interpretable, compact law forms generalize across various scaling regimes, significantly reducing the traditional trial-and-error in scientific discovery.
EvoSLD is an automated framework for neural Scaling Law Discovery (SLD) that integrates evolutionary algorithms with LLM guidance to simultaneously evolve parametric symbolic law forms and their optimization strategies. Designed for research on scaling trends across deep learning systems, EvoSLD generalizes over diverse experimental regimes, grouped data structures, and model classes, yielding interpretable, compact law forms with strong empirical predictive performance. The framework addresses the challenge of automated scientific discovery traditionally reliant on significant human trial-and-error in scaling law formulation and curve fitting (Lin et al., 27 Jul 2025).
1. Problem Formulation and Objective
EvoSLD operates on datasets of the form , where denote scaling variables (such as model size, dataset size), is the response metric (e.g., test loss), and parameterizes the control variable group (e.g., task, architecture). The goal is to discover a single symbolic expression —parameterized by coefficients —such that, for each control group , there exists a group-specific coefficient set yielding minimal fitting error across partitions .
The core SLD objective optimized by EvoSLD is:
where is the space of candidate symbolic expressions, is a group-wise fitting loss (group-normalized mean squared error, NMSE), and is a parsimony constraint penalizing the number of free coefficients in . Predictive quality is assessed via group NMSE and normalized mean/absolute error metrics on held-out test sets.
2. The EvoSLD Algorithm: Co-Evolutionary Workflow
EvoSLD implements evolutionary search over code subroutines, co-evolving two critical modules:
- Expression subroutine: Encodes the symbolic law template , with free (to-refit) parameters .
- Optimization subroutine: Programmatic logic that partitions into groups, optimizes for each control group, and outputs the total group-normalized loss.
The workflow encompasses:
- Initialization: A population of candidate (expression, optimization) pairs is seeded with simple “naive power-law” forms (e.g., ) and standard optimizers (BFGS).
- Evolutionary Loop (typ. 50 generations with multi-island, migration):
- Selection: High-fitness parent pairs are drawn based on group-NMSE.
- LLM-Guided Mutation: An LLM is prompted to generate either (i) symbolic mutations to expressions (e.g., power law broken power law), or (ii) tweaks to the optimizer (e.g., update strategies, initialization).
- Evaluation: Resulting child pairs are run, refitted per group, and scored by held-out NMSE.
- Database Update: The most competitive pairs are retained.
- Termination: The top-scoring symbolic expression is selected.
Co-evolving both the law structure and group-wise optimizer modules is essential for robust coefficient estimation, especially under sparse, heterogeneous, or multi-group data regimes.
3. Scaling Law Formulations and Search Space
While EvoSLD’s hypothesis space supports highly expressive forms (sums/products of powers, exponentials, harmonics), the majority of discovered scaling laws conform to physically interpretable templates such as:
where are the scaling variables, , are group-specific or group-shared coefficients (), and is typically interpreted as an irreducible error floor. The law space is explicitly hard-bounded by a maximum number of coefficients (), enforcing parsimony and mitigating overfitting.
4. Experimental Scenarios and Baselines
EvoSLD was validated across five real-world SLD scenarios, each drawn from contemporary scaling law literature:
| Scenario | Controls | Example Law Form | Coefficient Cap () |
|---|---|---|---|
| Vocabulary Size | None | vs , , ; powers | $7$ |
| Supervised Fine-Tuning (SFT) | Architecture Task | $4$ | |
| Domain Mixture | Model Size | per-domain powers | |
| Mixture-of-Experts (MoE) | None | $6$ | |
| Data-Constrained Pretraining | None | Three scaling axes , , | $7$ |
The dataset splits were group- or random-based, with held-out scales reserved for final evaluation. Baselines included fixed power-law SLD, symbolic regression (PySR, GPlearn), EvoSLD with a fixed optimizer, and published human-derived laws with coefficients re-fit under EvoSLD’s grouped optimizer.
5. Empirical Performance and Case Studies
On all scenarios except Data-Constrained (where it ranked second in NMAE), EvoSLD achieved the best NMSE/NMAE on held-out sets. In the Vocabulary and SFT tasks, EvoSLD exactly rediscovered the published law forms. In others, it surpassed human and symbolic regression baselines, reducing NMSE by orders of magnitude. Notably, optimally-distilled expressions required significantly fewer coefficients (e.g., only $4$ of $7$ allowed in MoE).
Notable Results
- Vocabulary Size: Exact recovery of (identical to Tao et al. 2024).
- SFT: Exact recovery of .
- Domain Mixture: EvoSLD discovered , obtaining NMSE compared to $0.0669$ for the prior law.
- Data-Constrained: Identified a compact three-term law , with lower NMSE than the two-term human baseline.
- Mixture-of-Experts: Produced , requiring only $4$ coefficients and achieving superior NMSE.
Conventional symbolic regression consistently failed or could not handle grouped/control-variable settings, and EvoSLD ablated to fixed optimizers exhibited higher NMSE, highlighting the necessity of optimizer co-evolution.
6. Interpretability, Efficiency, and Limitations
EvoSLD directly enforces parsimony via hard coefficient caps and LLM prior-guidance toward canonical operators (powers, exponentials). Discovered scaling laws are typically succinct, aligning with prevailing notions of scientific simplicity and physical plausibility.
A full EvoSLD search, including all code subroutine mutations and cross-validation, can be completed in minutes on commodity hardware, contrasting with the manual week-long analysis required by domain experts. Despite this, EvoSLD’s reliance on static, published datasets limits its ability to select new experimental points, making robust coefficient fitting challenging amid sparse group data. Independent evolutionary runs tend to yield diverse symbolic forms, suggesting that the system’s synthesis is genuine and not simply a product of LLM pretraining exposures.
Planned future directions include expanding EvoSLD to active agentic modes: proposing new experiments, executing autonomous data collection in sandboxed environments, and conducting formal statistical evaluations.
7. Summary and Broader Implications
EvoSLD constitutes the first evolutionary, LLM-guided framework (as of 2025) for scaling law discovery that (i) formalizes SLD with grouped fits and strict parsimony, (ii) co-evolves both symbolic law forms and their optimizing algorithms, and (iii) empirically matches or surpasses established laws in challenging deep learning scaling scenarios (Lin et al., 27 Jul 2025). This suggests automated SLD may substantially reduce manual escalation curves in future AI systems research, provided ongoing challenges in experiment generation and model selection are addressed.