Shannon Scaling Law Overview

Updated 4 July 2026

Shannon Scaling Law is an asymptotic framework that relates Shannon-theoretic quantities, such as entropy and mutual information, to resource-dependent corrections.
It unifies diverse scaling formulations across domains, including wireless throughput limits, molecular communication, quantum many-body systems, and neural network capacity.
The law’s applications hinge on model-specific parameters like noise levels, basis dependence, and scaling exponents, emphasizing conditional linearity and logarithmic corrections.

Searching arXiv for papers using or closely related to the term "Shannon Scaling Law" and adjacent formulations across fields. The expression Shannon Scaling Law is used in several distinct but structurally related ways across the literature. Across the cited works, it denotes asymptotic relations in which a Shannon-theoretic quantity—typically Shannon entropy, Rényi entropy, mutual information, or channel capacity—scales with a control parameter such as bandwidth, signal-to-noise ratio, wavelength, number of symbols, model size, token count, molecule budget, or finite system size. In some settings the law is a capacity statement of the form “throughput is the minimum of available sessions and physical degrees of freedom”; in others it is an entropy law such as $S(t)=A+\delta \ln t$ or a universal logarithmic correction controlled by Nambu–Goldstone mode counting; and in still others it characterizes how optimal discrete codes approach a continuous limit or how model performance turns non-monotonic once noise dominates signal (Lee et al., 2010, Abbott et al., 2017, Misguich et al., 2016, Ouyang et al., 22 May 2026).

1. Scope and canonical forms

The main usages represented in the literature can be organized by the underlying Shannon quantity and by the resource whose scaling is studied.

Domain	Quantity	Scaling law
Wireless ad hoc networks	Aggregate throughput $T(n,\lambda)$	$T(n,\lambda)\asymp \min\{n, D/\lambda\}$
Molecular communication	Mutual information	$\Theta(\log t)$ , $\Theta(\log m)$ , or $\Theta(t)=\Theta(m)$
Amplitude-constrained analog channels	Capacity vs optimal alphabet size	$I\sim \frac{3}{4}\log K$
Fractional diffusion	Shannon entropy of evolving PDF	$S(t)=A+\delta \ln t$
2D broken-symmetry quantum systems	Shannon–Rényi entropy	$\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$
Quasiparticle states in quantum chains	Local-basis Shannon entropy	$S_K(\ell)=\|K\|x\log L+\delta S_K(x)$
Perceptron networks	Sample capacity	$T(n,\lambda)$ 0
LLM training under perturbation	Capacity proxy / loss	$T(n,\lambda)$ 1

What unifies these statements is not a single invariant formula, but a common asymptotic logic: an information measure is expressed as a leading term plus a resource-dependent correction whose exponent or prefactor is controlled by geometry, noise, discreteness, or low-energy mode counting. This suggests that “Shannon Scaling Law” functions as a cross-domain label for Shannon-theoretic asymptotics rather than as a uniquely standardized theorem.

2. Capacity scaling in communication systems

In wireless ad hoc networks, Lee and Chung formulate a Shannon-style scaling law by combining Shannon capacity with Maxwellian degrees-of-freedom limits. For $T(n,\lambda)$ 2 randomly distributed nodes, the spatial DoF scales as $T(n,\lambda)$ 3, where $T(n,\lambda)$ 4 is network diameter and $T(n,\lambda)$ 5 the wavelength. Consequently, the aggregate throughput satisfies, to within an arbitrarily small exponent and polylog factors,

$T(n,\lambda)$ 6

Linear scaling is recovered only when the wavelength shrinks sufficiently fast: $T(n,\lambda)$ 7 in dense networks and $T(n,\lambda)$ 8 in extended networks. Otherwise the DoF limit dominates. The achievability mechanism is modified hierarchical cooperation, in which only a subset of nodes in a cluster participates in inter-cluster MIMO so that the transmit dimension matches the Maxwell-induced MIMO rank (Lee et al., 2010).

In molecular communication, the exact Shannon capacity remains open, so the scaling law is formulated at the level of mutual information order growth. For point-to-point diffusion-based communication with $T(n,\lambda)$ 9 time slots and $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 0 molecules, the single-resource laws are logarithmic: $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 1 When both resources scale proportionally, $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 2, the law becomes linear: $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 3 The lower bounds are obtained by explicit signaling constructions together with Fano and auxiliary-channel arguments, while upper bounds use counting and entropy inequalities. The analogy to Shannon–Hartley is explicit: increasing only one constrained resource gives logarithmic returns, whereas growing time and molecule budget together yields linear information growth (Eckford et al., 2014).

A different capacity-style law appears in amplitude-constrained analog channels. In the low-noise or data-rich limit, the capacity-achieving input distribution remains discrete at finite noise but becomes increasingly dense as the effective noise falls. Abbott and Machta show that the optimal number of symbols $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 4 obeys

$T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 5

equivalently

$T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 6

In the Gaussian and Bernoulli settings considered there, the derivation proceeds through an entropy expansion for a comb of mass points, a variational entropy density for the local mass-point density $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 7, and the asymptotic law $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 8, with $T(n,\lambda)\asymp \min\{n, D/\lambda\}$ 9 the Fisher proper length. This law quantifies the discrete-to-continuous transition of optimal priors under bounded-support constraints (Abbott et al., 2017).

3. Entropy scaling in stochastic diffusion and heavy-tailed dynamics

In fractional diffusion driven by $\Theta(\log t)$ 0-stable Lévy noise, the Shannon scaling law takes a direct entropic form. If the one-point PDF obeys the self-similar ansatz

$\Theta(\log t)$ 1

then substitution into the Shannon entropy

$\Theta(\log t)$ 2

gives

$\Theta(\log t)$ 3

For the early-time regime of the fractional Langevin equation,

$\Theta(\log t)$ 4

the paper identifies $\Theta(\log t)$ 5 with the growth exponent $\Theta(\log t)$ 6, so the entropy law becomes $\Theta(\log t)$ 7. In one dimension the exponents satisfy

$\Theta(\log t)$ 8

The stationary one-point PDF has heavy tail $\Theta(\log t)$ 9, so the framework is designed precisely for regimes in which variance-based diagnostics may fail (Nezhadhaghighi, 2017).

The methodological role of diffusion entropy analysis is central. Instead of extracting scaling from the global width alone, DEA estimates the slope of $\Theta(\log m)$ 0 versus $\Theta(\log m)$ 1, thereby using the full one-point PDF. This is significant in the Lévy-stable regime $\Theta(\log m)$ 2, where higher moments diverge and only fractional moments remain finite. The numerical analysis reported in the paper shows consistency between the analytic prediction for $\Theta(\log m)$ 3, width-based measurements, local roughness scaling, and the DEA slope (Nezhadhaghighi, 2017).

In this usage, the Shannon scaling law is not a channel-capacity law but an entropy-dilation law for evolving probability distributions. The underlying logic is nevertheless recognizably Shannonian: scaling is inferred from how the information content of a normalized PDF changes under self-similar dilation.

4. Shannon–Rényi finite-size scaling in quantum many-body systems

In two-dimensional quantum antiferromagnets with spontaneously broken continuous symmetry, Misguich, Pasquier, and Oshikawa derive a universal finite-size scaling law for basis-dependent Shannon and Rényi entropies. For a pure ground state $\Theta(\log m)$ 4 expanded in a computational basis $\Theta(\log m)$ 5, the probabilities are $\Theta(\log m)$ 6, with

$\Theta(\log m)$ 7

On a torus, for systems with $\Theta(\log m)$ 8 type-A Nambu–Goldstone modes and total site number $\Theta(\log m)$ 9, the large- $\Theta(t)=\Theta(m)$ 0 laws are

$\Theta(t)=\Theta(m)$ 1

For $\Theta(t)=\Theta(m)$ 2 or XY symmetry breaking, $\Theta(t)=\Theta(m)$ 3; for $\Theta(t)=\Theta(m)$ 4 Heisenberg antiferromagnets, $\Theta(t)=\Theta(m)$ 5. The basis choice is essential: the analysis uses an $\Theta(t)=\Theta(m)$ 6 basis aligned with a candidate ordering direction, so the universal logarithm is explicitly basis dependent (Misguich et al., 2016).

The physical mechanism is a superposition of two contributions. The oscillator sector of the massless free-field theory gives a universal logarithmic correction controlled by the determinant of the Laplacian. The zero-mode sector contributes through the finite-size restoration of symmetry: although the exact finite-volume ground state is rotationally invariant, the thermodynamic broken-symmetry manifold gives rise to Anderson’s tower of states. Phase-space counting yields a degeneracy factor $\Theta(t)=\Theta(m)$ 7 with $\Theta(t)=\Theta(m)$ 8, contributing $\Theta(t)=\Theta(m)$ 9 to $I\sim \frac{3}{4}\log K$ 0. For $I\sim \frac{3}{4}\log K$ 1, replica-boundary mass relevance locks the replicas and produces the factor $I\sim \frac{3}{4}\log K$ 2; for $I\sim \frac{3}{4}\log K$ 3, the boundary mass is symmetry-forbidden and only the oscillator contribution remains with negative sign (Misguich et al., 2016).

The same framework extends beyond the full torus. For a line subsystem embedded in the two-dimensional system,

$I\sim \frac{3}{4}\log K$ 4

while the oscillator contribution alone has no $I\sim \frac{3}{4}\log K$ 5 term. On cylinders at $I\sim \frac{3}{4}\log K$ 6,

$I\sim \frac{3}{4}\log K$ 7

with aspect ratio $I\sim \frac{3}{4}\log K$ 8. QMC on tori and lines, and DMRG on square-lattice cylinders, are reported to support these logarithmic coefficients and aspect-ratio dependences (Misguich et al., 2016).

5. Local-basis Shannon scaling in quasiparticle states

A distinct quantum usage appears in quasiparticle excited states of free bosonic and fermionic chains, and in the ferromagnetic phase of the spin- $I\sim \frac{3}{4}\log K$ 9 XXX chain. Here the relevant quantities are the total-system Shannon entropy, subsystem Shannon entropy for a block $S(t)=A+\delta \ln t$ 0, and the subsystem Shannon mutual information

$S(t)=A+\delta \ln t$ 1

In the scaling limit $S(t)=A+\delta \ln t$ 2, $S(t)=A+\delta \ln t$ 3, with $S(t)=A+\delta \ln t$ 4, the subsystem law for a multiparticle state $S(t)=A+\delta \ln t$ 5 takes the generic form

$S(t)=A+\delta \ln t$ 6

which implies a finite mutual information in the scaling limit (Ye et al., 2023).

For a single-particle state, the formulas are

$S(t)=A+\delta \ln t$ 7

$S(t)=A+\delta \ln t$ 8

For double-particle states with distinct momenta, the universal large-momentum-difference regime gives

$S(t)=A+\delta \ln t$ 9

$\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 0

$\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 1

At exceptional commensurate values $\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 2, discrete non-universal constants appear through $\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 3 or $\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 4 weight sums, depending on bosonic or fermionic statistics (Ye et al., 2023).

The central qualitative result is that local-basis Shannon entropy, unlike entanglement entropy, typically does not separate for quasiparticles with large momentum differences. In the double-particle mutual information, the term

$\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 5

persists even as $\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 6, so the large-momentum regime does not reduce to a sum of single-particle contributions. Numerical results for triple- and quadruple-particle states indicate further universal bosonic and fermionic curves that remain distinct from classical-particle benchmarks. This directly opposes a naive semiclassical additivity picture for local-basis Shannon entropy (Ye et al., 2023).

6. Shannon-style scaling in neural and LLMs

In feed-forward perceptron networks, a Shannon-style coding argument yields two linear capacity laws. Under the paper’s assumptions of binary labels and inputs in random position, the Lossless Memory dimension and MacKay dimension are

$\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 7

where $\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 8 is the number of trainable parameters, including biases. The derivation combines a noiseless transmitter–channel–receiver analogy with Cover–Schläfli dichotomy counting for linear separators. For a single perceptron, the 50% realizability threshold is located by $\mathrm{const}\cdot N \pm \mathrm{const}\cdot \ln N$ 9, and the network result is then extended additively across perceptrons. In this formulation, the “Shannon” content is not Shannon–Hartley but a bit-counting interpretation of memorization capacity (Friedland et al., 2017).

A more literal Shannon–Hartley reinterpretation appears in recent work on LLMs under perturbation. The model maps bandwidth to a model-capacity proxy $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 0, signal power to training tokens $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 1, and noise power to a composite term $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 2, yielding

$S_K(\ell)=|K|x\log L+\delta S_K(x)$ 3

Loss is then modeled by the reciprocal map

$S_K(\ell)=|K|x\log L+\delta S_K(x)$ 4

This framework is used to account for catastrophic overtraining, quantization-induced degradation, Gaussian weight perturbation, and U-shaped performance curves. Along the token axis, a turnover appears when $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 5; along the model-size axis in the noise-dominated regime, the condition is $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 6. The paper reports strong $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 7 fits on Pythia and OLMo2 under Gaussian noise, quantization, and supervised fine-tuning perturbations, including pooled extrapolation to an unseen 12B model (Ouyang et al., 22 May 2026).

These two neural-network usages share a Shannon framing but formalize different objects. The perceptron law measures memorization capacity in bits under random labeling. The LLM law measures an effective training capacity through a noisy-channel ansatz in which monotonic improvement fails once signal-to-noise ratio degrades with scale. Taken together, they show that Shannon-style scaling in machine learning may refer either to combinatorial storage thresholds or to noise-limited performance surfaces.

7. Assumptions, misconceptions, and unresolved points

Several recurrent misconceptions are explicitly contradicted by the cited literature. First, the expression Shannon Scaling Law does not denote a single universally accepted formula. In wireless networking it is a DoF-limited throughput law; in amplitude-constrained analog channels it is a discrete-to-continuous symbol-growth law; in fractional diffusion it is the logarithmic growth of Shannon entropy under self-similar dilation; in quantum many-body physics it is a universal logarithmic finite-size correction; and in LLM studies it is a noisy-channel performance law (Lee et al., 2010, Abbott et al., 2017, Nezhadhaghighi, 2017, Ouyang et al., 22 May 2026).

Second, linear scaling is highly conditional. In ad hoc networks, linear throughput appears only if $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 8 is small enough that $S_K(\ell)=|K|x\log L+\delta S_K(x)$ 9; otherwise the DoF ceiling dominates. In molecular communication, linear information growth requires time and molecule resources to scale together; with only one resource growing, the law is logarithmic. In LLMs, monotone scaling breaks when the fitted noise exponents overpower the signal exponent, producing U-shaped curves rather than continued gains (Lee et al., 2010, Eckford et al., 2014, Ouyang et al., 22 May 2026).

Third, basis dependence is indispensable in the quantum Shannon-entropy literature. The universal $T(n,\lambda)$ 00 or $T(n,\lambda)$ 01 terms for 2D broken-symmetry phases require a basis aligned with an ordering direction; a nonaligned or symmetry-preserving basis can change or remove the logarithms. Likewise, in quasiparticle chains the site-occupation or $T(n,\lambda)$ 02 basis produces logarithmic subsystem scaling, whereas the $T(n,\lambda)$ 03 basis in the XXX chain exhibits qualitatively different leading behavior (Misguich et al., 2016, Ye et al., 2023).

Finally, several open problems remain explicit. Molecular communication still lacks a closed-form Shannon capacity. The regime $T(n,\lambda)$ 04 for the Shannon–Rényi analysis of broken-symmetry phases is analytically subtle and left for future work. The LLM noisy-channel law is validated on specific model families and perturbation regimes rather than as a universal theorem across all architectures. These limitations underscore a general point: Shannon scaling laws are asymptotic, model-dependent statements whose validity rests on stated channel, basis, symmetry, or noise assumptions, even when their functional forms appear deceptively universal.