Scaling-Law Guided Search

Updated 8 February 2026

Scaling-Law Guided Search is a methodology that leverages power-law relationships between model size, data volume, and performance to predict optimal configurations.
It employs pilot experiments and log–log regression to extrapolate scaling curves, reducing computational cost by up to 100× compared to exhaustive grid searches.
The approach applies across domains—sequential recommendation, LLM fine-tuning, and test-time inference—delivering measurable gains in efficiency and predictive accuracy.

Scaling-Law Guided (SLG) Search is a class of algorithmic methodologies that exploit empirical scaling laws—typically power-law relations between model size, dataset size, and performance metrics—to efficiently and systematically allocate resources in neural network training, model selection, and test-time inference. SLG Search has emerged independently across distinct domains, such as sequential recommendation models, resource-constrained LLM selection, and test-time reward optimization. These techniques obviate brute-force grid search by leveraging fitted scaling-law curves to predict optimal or near-optimal configurations under various compute, data, or evaluation budgets.

1. Foundational Principles and Motivation

The inception of SLG Search is rooted in the observation that many neural model classes (Transformers, LLMs, recommender models) exhibit smooth, empirically measurable scaling laws: the dependence of loss or reward on model size, dataset size, or sampling budget follows parametric power-law relationships. By quantifying these dependencies in small pilot regimes, practitioners can infer the marginal utility of scaling—and allocate resources beyond the pilot regime—to maximize target metrics subject to constraints.

This paradigm addresses two persistent challenges:

Resource allocation: Given fixed training compute or inference budget, how should practitioners split between model size and data volume, or among multiple candidate models?
Selection and extrapolation: How can one efficiently select models or states likely to provide optimal downstream results, without exhaustive trial and error?

Early methodologies relied on monolithic interpretation of power-law scaling, finding that train/test loss $L$ falls as $L \sim N^{-\alpha}$ (model size $N$ ) or $L \sim D^{-\beta}$ (data size $D$ ), with slow, predictable diminishing returns. The current generation of SLG Search builds upon these forms, introducing mechanisms for accurate extrapolation, phase-transition detection, and optimal resource allocation (Zhang et al., 2023, Lin et al., 2024, Li et al., 1 Feb 2026).

2. Scaling-Law Modeling in Sequential Recommendation

Zhang et al. (Zhang et al., 2023) provide a detailed and validated framework for SLG Search in large sequential recommender models. The core insight is that cross-entropy test loss in decoder-only ID-based Transformers can be represented as the sum of two power-law terms:

$L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$

where $N$ is non-embedding parameter count, $D$ is total interaction (data) count, $\alpha_N$ and $\alpha_D$ are empirical exponents, $L \sim N^{-\alpha}$ 0 are characteristic scales, and $L \sim N^{-\alpha}$ 1 is the irreducible error floor. The fitting procedure entails:

Training a small number (3–5) of pilot models at varying $L \sim N^{-\alpha}$ 2 and $L \sim N^{-\alpha}$ 3.
Log-transforming $L \sim N^{-\alpha}$ 4 and/or $L \sim N^{-\alpha}$ 5 and applying least-squares regression in log–log space.
Extracting $L \sim N^{-\alpha}$ 6 and $L \sim N^{-\alpha}$ 7, which typically satisfy $L \sim N^{-\alpha}$ 8, notably larger than for LLMs ( $L \sim N^{-\alpha}$ 9), implying faster returns to scaling in this context.

Empirical validation is provided by extrapolating fitted curves from “small-to-medium” models (up to $N$ 09M parameters) to previously untested scales (up to 829M parameters), with predicted and observed loss matching within 1–2%. This demonstrates that the scaling regime is robust across multiple orders of magnitude and can be exploited for resource allocation (Zhang et al., 2023).

3. SLG Search in Model Selection and Fine-Tuning

The proliferation of pre-trained LLMs presents the challenge of efficiently identifying which model to fine-tune, especially when brute-force tuning is prohibitive. "Selecting LLM to Fine-tune via Rectified Scaling Law" (Lin et al., 2024) formalizes this as a prediction task: using limited fine-tuning on small data subsets, estimate a model's potential full-data performance, then select the model with minimum predicted loss.

A central observation is that, unlike pre-training, the fine-tuning loss curve in log–log space exhibits a two-phase structure: a "pre-power" regime (initial, with large and decreasing slope) and a "power phase" (linear/power-law). The authors show that standard single-phase power laws fail to capture this regime transition. The Rectified Scaling Law is introduced:

$N$ 1

where $N$ 2 is the subset size, $N$ 3 captures equivalent pre-learned downstream data from pre-training, $N$ 4 and $N$ 5 are scalars, and $N$ 6 is the fine-tuning exponent. This law enables accurate prediction (mean squared error in log–log fit $N$ 7) of extrapolated fine-tuning loss.

The SLG Search ("Accept then Stop", AtS) algorithm operates as follows:

Fine-tune each candidate model $N$ 8 on progressively smaller subsets $N$ 9, recording $L \sim D^{-\beta}$ 0.
Fit a linear model in log–log space once enough (typically $L \sim D^{-\beta}$ 1) points are gathered and the curve enters the power law regime.
Stop iterating when the latest point deviates by more than a set threshold from the fitted trend (pre-power regime exit).
Use the fitted curve to predict full-data performance for $L \sim D^{-\beta}$ 2.
Select the model with the lowest predicted loss.

SLG Search achieves $L \sim D^{-\beta}$ 3 reduction in compute over naïve methods (e.g., $L \sim D^{-\beta}$ 4 pilot budget results in $L \sim D^{-\beta}$ 5 of full-tuning compute per model), and maintains high selection quality (relative accuracy $L \sim D^{-\beta}$ 6, Pearson correlation $L \sim D^{-\beta}$ 7– $L \sim D^{-\beta}$ 8 with true orderings) (Lin et al., 2024).

4. Scaling-Law Guided Search for Test-Time Inference

For stochastic LLMs, test-time strategies such as "best-of- $L \sim D^{-\beta}$ 9" (BoN) sample multiple completions and select the highest-rewarded. "Predicting and improving test-time scaling laws via reward tail-guided search" (Li et al., 1 Feb 2026) extends SLG Search to this setting, departing from uniform resource allocation in favor of tail-extrapolated estimate-based allocation.

Given prompt $D$ 0 and model $D$ 1, generating an intermediate state $D$ 2, the reward $D$ 3 of the terminal response $D$ 4 (post $D$ 5 rollout) follows empirical distribution $D$ 6. SLG Search leverages the following steps:

Tail-extrapolation: Model the upper tail of $D$ 7 as Gaussian; collect $D$ 8 pilot completions, extract the tail (top $D$ 9), compute sample mean and variance, and invert truncated-normal moment formulas to estimate $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 0.
Predict scaling law: For large $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 1,

$L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 2

predicts the maximum expected reward from $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 3 samples.

Two-stage resource allocation:
- Exploration: For $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 4 intermediate states $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 5, sample $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 6 pilot rollouts and compute $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 7.
- Exploitation: Allocate all remaining budget to $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 8, sample $L(N, D) \approx E + \left(\frac{N_0}{N}\right)^{\alpha_N} + \left(\frac{D_0}{D}\right)^{\alpha_D}$ 9 from $N$ 0, return best seen reward.

This approach, under mild conditions, guarantees that SLG Search not only achieves vanishing regret versus the perfect-information oracle as $N$ 1, but also delivers polynomial compute amplification over flat BoN—i.e., matches BoN at $N$ 2 using only $N$ 3 samples. Empirical validation on math reasoning (AMC, AIME) with contemporary LLMs shows consistent and significant gains (e.g., $N$ 429% total-reward gain on AIME2024+1B at $N$ 5 over BoN) (Li et al., 1 Feb 2026).

5. Comparative Methodology and Implementation

The essential workflow in SLG Search—across domains—follows this pattern:

Pilot Fitting:
- Train or evaluate on a set of small models, subsets, or states.
- Record relevant target metrics (loss, reward) at each scale.
Scaling Law Inference:
- Fit parametric forms (single, or two-phase power-law, rectified, or tail models) to the empirical data.
- Validate goodness-of-fit ( $N$ 6 typical for loss scaling; RMS error checks for reward).
Predictive Extrapolation:
- Using the fitted law, infer the optimal resource allocation (model size $N$ 7, data size $N$ 8, or state selection) under budgetary constraints.
- For sequential recommendation under $N$ 9, compute:
$D$ 0
Resource Application and Validation:
- Allocate resources per the prediction.
- Train or evaluate, validate achieved metric against either the law (for loss, within $D$ 12%) or actual ranking (for reward/model selection).
Practical Enhancements:
- For training large models, implement stability advances (e.g., layer-wise adaptive dropout, Adam→SGD).
- For low-data or cold-start regimes, interpret $D$ 2 as “effective unique interactions $D$ 3 epochs” and monitor diminishing returns from data repetition.
- For multiple model candidates, parallelize the SLG Search process (Zhang et al., 2023, Lin et al., 2024, Li et al., 1 Feb 2026).

6. Empirical Outcomes and Limitations

Research has established several robust empirical findings:

Prediction reliability: SLG-fitted curves predict loss/reward at previously untested large scales to within 1–2% (recommendation), or rank order with $D$ 495% accuracy (LLM selection).
Resource efficiency: Compute savings of $D$ 5– $D$ 6 are typical versus exhaustive methods.
Task-specific gains: In sequential recommendation, larger models disproportionally improve outcomes for cold-start, long-tail, adversarial, and cross-domain settings.
Regret guarantees: In test-time inference, SLG Search achieves vanishing regret relative to perfect-information oracles, and outpaces “flat” best-of- $D$ 7 selection by polynomial factors in $D$ 8.

However, several caveats remain:

Regime validity: Extrapolation is reliable only within the range validated by pilot sweeps; for $D$ 9 much larger than fitted, embedding collapse or exponent drift may occur.
Phase identification: Accurate two-phase modeling is critical in fine-tuning selection; single-phase laws underperform in regimes exhibiting pre-power knees.
Data sparsity edge effects: For extremely small $\alpha_N$ 0, additional heuristics (e.g., increasing $\alpha_N$ 1 or more sensitive deviation thresholds) may be necessary to avoid underfitting or premature stopping (Zhang et al., 2023, Lin et al., 2024).

7. Practical Guidelines and Cross-Domain Implications

Implementation of SLG Search is straightforward under the provided recipes:

Application domain	Pilot phase	Scaling law fit	Resource solve	Empirical gain
Sequential reco. (Zhang et al., 2023)	3–5 models, varying $\alpha_N$ 2/ $\alpha_N$ 3	Power-law sum	$\alpha_N$ 4 under $\alpha_N$ 5	1–2% fit error; no grid
LLM selection (Lin et al., 2024)	Subset fine-tunes per model	Rectified two-phase	AtS, select by predicted loss	$\alpha_N$ 6 compute cut
Test-time LLM inference (Li et al., 1 Feb 2026)	Pilot rollouts per state	Gaussian-tail extrap.	Stagewise allocation	$\alpha_N$ 7 reward lift

In all cases, SLG Search provides a principled, statistically grounded mechanism for converting initial pilot regime measurements into actionable resource allocation at scale, drastically reducing experimental cost while maintaining predictivity and control over scaling behavior.

A plausible implication is that, as scaling laws are further generalized and refined, SLG Search and its variants will become a foundational ingredient in neural model development pipelines across domains.

Markdown Report Issue Upgrade to Chat

References (3)

Scaling Law of Large Sequential Recommendation Models (2023)

Selecting Large Language Model to Fine-tune via Rectified Scaling Law (2024)

Predicting and improving test-time scaling laws via reward tail-guided search (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Scaling-Law Guided (SLG) Search.

Scaling-Law Guided Search

1. Foundational Principles and Motivation

2. Scaling-Law Modeling in Sequential Recommendation

3. SLG Search in Model Selection and Fine-Tuning

4. Scaling-Law Guided Search for Test-Time Inference

5. Comparative Methodology and Implementation

6. Empirical Outcomes and Limitations

7. Practical Guidelines and Cross-Domain Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Scaling-Law Guided Search

1. Foundational Principles and Motivation

2. Scaling-Law Modeling in Sequential Recommendation

3. SLG Search in Model Selection and Fine-Tuning

4. Scaling-Law Guided Search for Test-Time Inference

5. Comparative Methodology and Implementation

6. Empirical Outcomes and Limitations

7. Practical Guidelines and Cross-Domain Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research