Rectified Scaling Law

Updated 1 July 2025

Rectified scaling laws are a framework describing how system performance improves with resources, where the scaling exponent or form changes depending on system parameters and conditions.
These laws explain phenomena like the varying exponents of heat transport in turbulent convection or the decay rate of excess risk in statistical learning models.
Rectified scaling provides a unified theory anchoring diverse observed scaling behaviors to underlying system structure and parameters, improving prediction and resource allocation in scientific and engineering design.

A rectified scaling law describes how the performance of models or systems improves with increasing resource allocation (model size, data, compute, signal resolution), with the key refinement that the scaling exponent, regime, or even qualitative form of this law varies systematically with underlying system parameters and boundary conditions. The concept arises in fields ranging from turbulent convection and communication theory to statistical learning and neural networks, reflecting that scaling exponents are generally neither universal nor immutable, but determined by physical, statistical, or information-theoretic constraints. This article surveys the theoretical foundations, analytical regimes, formulaic relations, and empirical implications of rectified scaling laws across major domains.

1. Foundational Principles of Rectified Scaling Laws

Rectified scaling laws generalize classic scaling relationships by making them regime-dependent and sensitive to latent physical or computational variables. Rather than postulating a universal power-law (e.g., $y \sim x^\beta$ with fixed $\beta$ ), rectified scaling identifies families of exponents or transition points whose values depend on underlying structure, such as boundary layer ordering in turbulence, intrinsic data geometry, system dimension, or hardware limitations.

Mathematically, rectified scaling laws express $Y \sim X^{\beta}$ or $Y \sim f(X; \theta)$ , where the exponent $\beta$ and/or the function $f$ is not fixed, but is a piecewise or parameter-dependent mapping. Classical examples illustrate that naive universal exponents (e.g., $1/3$ for heat transport, linear for channel symbols) fail to account for observed experimental or computational diversity; rectified scaling supplies a unified theory resolving these discrepancies.

2. Applications in Turbulent Thermal Convection

In turbulent Rayleigh-Bénard convection, the rectified scaling law describes how the Nusselt number ( $Nu$ , dimensionless heat flux) scales with the Rayleigh number ( $Ra$ ) and Prandtl number ( $Pr$ ) across multiple physical regimes. In Dubrulle’s solvable turbulent model (Dubrulle, 2011), the scaling exponent $\beta$ in the relation

$Nu \sim Ra^{\beta}$

varies with the relative sizes of the viscous, thermal, and integral length scales (classified by a parameter $a$ ) as well as with $Pr$ and $Ra$ . Key regime-dependent exponents include:

Classical 1/3 law ( $a=1$ ): $Nu \sim Ra^{1/3} Pr^{-1}$ for matched thermal and viscous boundary layers.
2/7 law ( $a=2/5$ ): $Nu \sim Ra^{2/7}$ for thermal layer at the midpoint of the inertial range.
Novel $4/13$ law ( $a=2/3$ ): $Nu \sim Ra^{4/13} \approx Ra^{0.308}$ when the thermal scale becomes the Taylor microscale.
Lower exponents ($1/4$, $1/5$) for low-Prandtl-number (e.g., mercury) flows.

The rectification is explicit: the exponent $\beta$ is a function of $a$ and shifts with $Pr$ or as critical Reynolds numbers are crossed, encapsulated in a general form

$Nu \sim Ra^{\frac{1}{3-4(1-a)}}$

with corresponding transitions between laminar and turbulent regimes.

3. Communication Channel Capacity in the Low-Noise Limit

In analog communication channels, the rectified scaling law captures the transition of the optimal codebook from discrete symbols to a continuum as noise decreases (Abbott et al., 2017). The result establishes a scaling for the optimal discrete input support size $K$ as a function of the Fisher information length $L$ and mutual information $I$ : $K \sim L^{4/3}, \quad \log K \sim I^{4/3}$ The exponent $4/3$ (or equivalently $\zeta=3/4$ in $I \sim \zeta \log K$ ) is rational and non-integer. Unlike the naive expectation of linear scaling ( $K\sim L$ ), the rectified law demonstrates that convergence to the continuous limit is superlinear with diminishing noise. For higher dimensions, $K_{\mathrm{tot}} \sim L^{D/\zeta}$ .

This scaling directly informs the quantization of parameter estimation, optimal code design, prior selection in Bayesian inference, and evolutionary signaling, and emphasizes the gradual, not abrupt, approach of discrete to continuous optimality.

4. Rectified Scaling Laws in Statistical Learning

Rectified scaling laws appear widely in learning theory, encompassing regression, kernel methods, and neural networks. The scaling exponent for test loss or excess risk depends intricately on data spectrum, regularization, feature learning, and algorithmic dynamics.

In quadratically parameterized linear regression with SGD (Ding et al., 13 Feb 2025), the excess risk under infinite-dimensional data and polynomial spectral decay is: $\mathcal{R}_M(v^T) - [\xi^2] \lesssim \frac{1}{M^{\beta-1}} + \frac{\sigma^2 D}{T} + \left[ \frac{D}{T} + \frac{1}{D^{\beta-1}}\mathds{1}_{D<M} \right]$ where the precise decay rate (the scaling order in $T$ ) is: $\begin{cases} T^{-1+1/\beta} & \text{if } \alpha \leq \beta \ T^{-(2\beta-2)/(\alpha+\beta)} & \text{if } \alpha > \beta \end{cases}$ The presence of feature learning via quadratic parameterization enables the algorithm to match or surpass the information-theoretic lower bound in challenging alignment regimes, rectifying the limitations of linear models and fixed-feature NTK regimes.

Similarly, in multiple and kernel regression (Chen et al., 3 Mar 2025), under decaying spectrum $\lambda_i = \Theta(i^{-a})$ , the generalization error obeys: $\mathbb{E}[L] = \sigma^2 + \Theta\left(M^{1-a}\right) + \Theta\left((N_{\mathrm{eff}}\gamma)^{-(a-1)/a}\right)$ (The exponents and prefactors are fully parameterized by $a$ .) This confirms that power-law scaling is not unique to highly nonlinear models but reflects spectral properties, and lends a theoretical rationale to rectified scaling in LLMs and high-dimensional neural networks.

5. Rectified Scaling in Growth, Recommendation, and Forecasting Systems

Rectified scaling laws extend naturally to complex systems beyond physics and learning models:

Complex Growth Systems: Growth in ecological, biological, and urban systems follows an $n/(n+1)$ scaling law ( $G \sim M^{n/(n+1)}$ ), where $n$ is the spatial dimension (Zhao, 2022). The law is rectified by the system's dimensionality, resolving empirical exponents (e.g., $2/3$ in 2D, $3/4$ in 3D) as a geometric consequence.
Sequential Recommendation Models: Power-law scaling governs the relation between test loss and both non-embedding parameter count and data size, with exponents dependent on data sparsity, model width/depth, and task hardness (Zhang et al., 2023). Rectification manifests in the plateauing of performance at irreducible entropy under extreme data or parameter limits.
Time Series Forecasting: In forecasting, the scaling law is rectified by joint effects of data size, model capacity, and look-back horizon (input window): beyond a data-dependent optimal horizon, extending context or model size can degrade performance due to undersampling and feature decay (Shi et al., 24 May 2024).

6. Mechanistic Interpretation and Unified Theory

Across these domains, the mechanism underlying rectified scaling is a shift in the limiting factor for system performance—boundary layer regime in convection; distinguishability under noise in channels; spectrum or target alignment in learning; intrinsic dimension in growth systems. Scaling exponents and even the qualitative scaling form (power, logarithmic, etc.) are thus "rectified" by system and algorithm parameters.

Notably, in statistical learning and deep networks, recent theory unifies rectified scaling as the minimization over candidate exponents set by discrete "quanta" (task clusters or subtasks, e.g., Zipfian distributed) or compact data manifolds, leading to explicit formulae for the best-attainable scaling rate (Brill, 10 Dec 2024): $\mathcal{L} \propto N^{-\gamma},\quad \gamma = \min\{\alpha, c/D\}$

$\mathcal{L} \propto \mathcal{D}^{-\delta},\quad \delta = \min\left\{ \frac{\alpha}{1+\alpha}, \frac{c}{D} \right\}$

This rectification principle—scaling determined by the "least favorable" exponent among the relevant mechanisms—offers a broad theoretical and practical template.

7. Practical Consequences and Implications

Rectified scaling laws have direct implications for experimental design, engineering, and modeling:

Prediction and Planning: Accurate forecasting of required resources (data, model size) for target performance, and recognition that performance plateaus are regime-dependent.
Resource Allocation: Guidance for optimal trade-off between scaling in model/data/compute versus problem-specific dimension, spectrum, or turbulence regime.
Design Correction: Correction of "one-size-fits-all" universal power laws; motivates parameter-sensitive, context-aware modeling and system design.
Empirical Reconciliation: Explains and reconciles disparate observed exponents and apparent breakdowns of classic power-law scaling in empirical settings.

The rectified scaling law framework anchors the observed diversity of scaling exponents in the fundamental physical, statistical, or algorithmic structure of the system, offering unified, predictive, and empirically validated explanations across a wide array of scientific and technological disciplines.