Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scaling-Law Guided Search

Updated 8 February 2026
  • Scaling-Law Guided Search is a methodology that leverages power-law relationships between model size, data volume, and performance to predict optimal configurations.
  • It employs pilot experiments and log–log regression to extrapolate scaling curves, reducing computational cost by up to 100× compared to exhaustive grid searches.
  • The approach applies across domains—sequential recommendation, LLM fine-tuning, and test-time inference—delivering measurable gains in efficiency and predictive accuracy.

Scaling-Law Guided (SLG) Search is a class of algorithmic methodologies that exploit empirical scaling laws—typically power-law relations between model size, dataset size, and performance metrics—to efficiently and systematically allocate resources in neural network training, model selection, and test-time inference. SLG Search has emerged independently across distinct domains, such as sequential recommendation models, resource-constrained LLM selection, and test-time reward optimization. These techniques obviate brute-force grid search by leveraging fitted scaling-law curves to predict optimal or near-optimal configurations under various compute, data, or evaluation budgets.

1. Foundational Principles and Motivation

The inception of SLG Search is rooted in the observation that many neural model classes (Transformers, LLMs, recommender models) exhibit smooth, empirically measurable scaling laws: the dependence of loss or reward on model size, dataset size, or sampling budget follows parametric power-law relationships. By quantifying these dependencies in small pilot regimes, practitioners can infer the marginal utility of scaling—and allocate resources beyond the pilot regime—to maximize target metrics subject to constraints.

This paradigm addresses two persistent challenges:

  • Resource allocation: Given fixed training compute or inference budget, how should practitioners split between model size and data volume, or among multiple candidate models?
  • Selection and extrapolation: How can one efficiently select models or states likely to provide optimal downstream results, without exhaustive trial and error?

Early methodologies relied on monolithic interpretation of power-law scaling, finding that train/test loss LL falls as LNαL \sim N^{-\alpha} (model size NN) or LDβL \sim D^{-\beta} (data size DD), with slow, predictable diminishing returns. The current generation of SLG Search builds upon these forms, introducing mechanisms for accurate extrapolation, phase-transition detection, and optimal resource allocation (Zhang et al., 2023, Lin et al., 2024, Li et al., 1 Feb 2026).

2. Scaling-Law Modeling in Sequential Recommendation

Zhang et al. (Zhang et al., 2023) provide a detailed and validated framework for SLG Search in large sequential recommender models. The core insight is that cross-entropy test loss in decoder-only ID-based Transformers can be represented as the sum of two power-law terms:

L(N,D)E+(N0N)αN+(D0D)αDL(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}

where NN is non-embedding parameter count, DD is total interaction (data) count, αN\alpha_N and αD\alpha_D are empirical exponents, N0,D0N_0, D_0 are characteristic scales, and EE is the irreducible error floor. The fitting procedure entails:

  • Training a small number (3–5) of pilot models at varying NN and DD.
  • Log-transforming (L,N)(L, N) and/or (L,D)(L, D) and applying least-squares regression in log–log space.
  • Extracting αN\alpha_N and αD\alpha_D, which typically satisfy αN0.12\alpha_N\sim0.12, notably larger than for LLMs (0.07\sim0.07), implying faster returns to scaling in this context.

Empirical validation is provided by extrapolating fitted curves from “small-to-medium” models (up to \sim9M parameters) to previously untested scales (up to 829M parameters), with predicted and observed loss matching within 1–2%. This demonstrates that the scaling regime is robust across multiple orders of magnitude and can be exploited for resource allocation (Zhang et al., 2023).

3. SLG Search in Model Selection and Fine-Tuning

The proliferation of pre-trained LLMs presents the challenge of efficiently identifying which model to fine-tune, especially when brute-force tuning is prohibitive. "Selecting LLM to Fine-tune via Rectified Scaling Law" (Lin et al., 2024) formalizes this as a prediction task: using limited fine-tuning on small data subsets, estimate a model's potential full-data performance, then select the model with minimum predicted loss.

A central observation is that, unlike pre-training, the fine-tuning loss curve in log–log space exhibits a two-phase structure: a "pre-power" regime (initial, with large and decreasing slope) and a "power phase" (linear/power-law). The authors show that standard single-phase power laws fail to capture this regime transition. The Rectified Scaling Law is introduced:

L^(D)=BDl+Dβ+E\hat L(D) = \frac{B}{D_l + D^\beta} + E

where DD is the subset size, DlD_l captures equivalent pre-learned downstream data from pre-training, BB and EE are scalars, and β\beta is the fine-tuning exponent. This law enables accurate prediction (mean squared error in log–log fit 0.007\sim0.007) of extrapolated fine-tuning loss.

The SLG Search ("Accept then Stop", AtS) algorithm operates as follows:

  1. Fine-tune each candidate model MiM_i on progressively smaller subsets DD, recording (logD,logL)(\log D, \log L).
  2. Fit a linear model in log–log space once enough (typically k=3k=3) points are gathered and the curve enters the power law regime.
  3. Stop iterating when the latest point deviates by more than a set threshold from the fitted trend (pre-power regime exit).
  4. Use the fitted curve to predict full-data performance for MiM_i.
  5. Select the model with the lowest predicted loss.

SLG Search achieves >100×>100\times reduction in compute over naïve methods (e.g., γ=1/256\gamma=1/256 pilot budget results in 0.8%\sim0.8\% of full-tuning compute per model), and maintains high selection quality (relative accuracy >95%>95\%, Pearson correlation 85\sim8590%90\% with true orderings) (Lin et al., 2024).

4. Scaling-Law Guided Search for Test-Time Inference

For stochastic LLMs, test-time strategies such as "best-of-NN" (BoN) sample multiple completions and select the highest-rewarded. "Predicting and improving test-time scaling laws via reward tail-guided search" (Li et al., 1 Feb 2026) extends SLG Search to this setting, departing from uniform resource allocation in favor of tail-extrapolated estimate-based allocation.

Given prompt xx and model π\pi, generating an intermediate state ss, the reward RsR_s of the terminal response yy (post π\pi rollout) follows empirical distribution FsF_s. SLG Search leverages the following steps:

  1. Tail-extrapolation: Model the upper tail of FsF_s as Gaussian; collect mm pilot completions, extract the tail (top αm\alpha m), compute sample mean and variance, and invert truncated-normal moment formulas to estimate (μ^,σ^)(\hat\mu, \hat\sigma).
  2. Predict scaling law: For large NN,

V^N(s)=μ^+σ^2lnN\hat V_N(s) = \hat\mu + \hat\sigma \sqrt{2 \ln N}

predicts the maximum expected reward from NN samples.

  1. Two-stage resource allocation:
    • Exploration: For KK intermediate states s1,...,sKs_1, ..., s_K, sample mm pilot rollouts and compute V^N(si)\hat V_N(s_i).
    • Exploitation: Allocate all remaining budget to sı^=argmaxV^N(si)s_{\hat\imath} = \arg\max \hat V_N(s_i), sample NKmN-Km from sı^s_{\hat\imath}, return best seen reward.

This approach, under mild conditions, guarantees that SLG Search not only achieves vanishing regret versus the perfect-information oracle as NN \rightarrow \infty, but also delivers polynomial compute amplification over flat BoN—i.e., matches BoN at N1+γN^{1+\gamma} using only NN samples. Empirical validation on math reasoning (AMC, AIME) with contemporary LLMs shows consistent and significant gains (e.g., \sim29% total-reward gain on AIME2024+1B at N=1000N=1000 over BoN) (Li et al., 1 Feb 2026).

5. Comparative Methodology and Implementation

The essential workflow in SLG Search—across domains—follows this pattern:

  1. Pilot Fitting:
    • Train or evaluate on a set of small models, subsets, or states.
    • Record relevant target metrics (loss, reward) at each scale.
  2. Scaling Law Inference:
    • Fit parametric forms (single, or two-phase power-law, rectified, or tail models) to the empirical data.
    • Validate goodness-of-fit (R2>0.95R^2 > 0.95 typical for loss scaling; RMS error checks for reward).
  3. Predictive Extrapolation:
    • Using the fitted law, infer the optimal resource allocation (model size NN^*, data size DD^*, or state selection) under budgetary constraints.
    • For sequential recommendation under ND=CN\,D = C, compute:

    N=[αNαD(CD0)αDN0αN]1/(αN+αD),D=C/NN^* = \left[\frac{\alpha_N}{\alpha_D} \left(\frac{C}{D_0}\right)^{\alpha_D} N_0^{\alpha_N}\right]^{1/(\alpha_N + \alpha_D)}, \quad D^* = C/N^*

  4. Resource Application and Validation:

    • Allocate resources per the prediction.
    • Train or evaluate, validate achieved metric against either the law (for loss, within \sim2%) or actual ranking (for reward/model selection).
  5. Practical Enhancements:
    • For training large models, implement stability advances (e.g., layer-wise adaptive dropout, Adam→SGD).
    • For low-data or cold-start regimes, interpret DD^* as “effective unique interactions ×\times epochs” and monitor diminishing returns from data repetition.
    • For multiple model candidates, parallelize the SLG Search process (Zhang et al., 2023, Lin et al., 2024, Li et al., 1 Feb 2026).

6. Empirical Outcomes and Limitations

Research has established several robust empirical findings:

  • Prediction reliability: SLG-fitted curves predict loss/reward at previously untested large scales to within 1–2% (recommendation), or rank order with \sim95% accuracy (LLM selection).
  • Resource efficiency: Compute savings of 10210^2103×10^3\times are typical versus exhaustive methods.
  • Task-specific gains: In sequential recommendation, larger models disproportionally improve outcomes for cold-start, long-tail, adversarial, and cross-domain settings.
  • Regret guarantees: In test-time inference, SLG Search achieves vanishing regret relative to perfect-information oracles, and outpaces “flat” best-of-NN selection by polynomial factors in NN.

However, several caveats remain:

  • Regime validity: Extrapolation is reliable only within the range validated by pilot sweeps; for NN much larger than fitted, embedding collapse or exponent drift may occur.
  • Phase identification: Accurate two-phase modeling is critical in fine-tuning selection; single-phase laws underperform in regimes exhibiting pre-power knees.
  • Data sparsity edge effects: For extremely small DD, additional heuristics (e.g., increasing γ\gamma or more sensitive deviation thresholds) may be necessary to avoid underfitting or premature stopping (Zhang et al., 2023, Lin et al., 2024).

7. Practical Guidelines and Cross-Domain Implications

Implementation of SLG Search is straightforward under the provided recipes:

Application domain Pilot phase Scaling law fit Resource solve Empirical gain
Sequential reco. (Zhang et al., 2023) 3–5 models, varying NN/DD Power-law sum (N,D)(N^*, D^*) under ND=CND=C 1–2% fit error; no grid
LLM selection (Lin et al., 2024) Subset fine-tunes per model Rectified two-phase AtS, select by predicted loss >100×>100\times compute cut
Test-time LLM inference (Li et al., 1 Feb 2026) Pilot rollouts per state Gaussian-tail extrap. Stagewise allocation >25%>25\% reward lift

In all cases, SLG Search provides a principled, statistically grounded mechanism for converting initial pilot regime measurements into actionable resource allocation at scale, drastically reducing experimental cost while maintaining predictivity and control over scaling behavior.

A plausible implication is that, as scaling laws are further generalized and refined, SLG Search and its variants will become a foundational ingredient in neural model development pipelines across domains.

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Scaling-Law Guided (SLG) Search.