Shape-Aware Scaling Laws in Complex Systems

Updated 4 April 2026

Shape-aware scaling laws are empirical and theoretical relationships that connect performance metrics to both size and shape parameters across diverse systems.
They reveal critical design principles by quantifying how architectural, geometric, or network structure variations affect outcomes in deep learning, physics, and network theory.
Methodologies such as grid sweeps, log-log regressions, and asymptotic analyses enable precise determination of scaling exponents for optimal system design.

Shape-aware scaling laws formalize how performance, energy, or structural observables scale when the "shape"—not merely the size—of a system or model is varied. Here, "shape" can refer to architectural parameters in machine learning (width, depth, embedding dimension), geometric properties in physical systems, or combinatorial structure in networks. Incorporating variable shape reveals phenomena and scaling exponents inaccessible to classical, size-only scaling frameworks, producing sharper predictions and new design principles in domains such as deep learning, statistical mechanics, material science, and network theory.

1. Formalization of Shape-aware Scaling Laws

A shape-aware scaling law is an empirical or theoretical relationship connecting a performance metric (e.g., validation loss, friction force, energy minimum, width of a condensate, network diameter) not only to scalar measures like total system size $N$ , dataset size $D$ , or compute budget $C$ , but also to specific "shape parameters" $s$ —such as architectural width $w$ and depth $d$ , geometric aspect ratios, or network motifs. The general form is

$O(N, D, s) \sim f(N, D)\, g(s)$

where $g(s)$ encodes "shape" sensitivity. For neural models, $s$ may be width $w$ and depth $D$ 0; for spatial or network systems, $D$ 1 may refer to geometric or topological descriptors.

Shape-aware scaling is established by:

Varying shape parameters at fixed $D$ 2 to empirically or analytically fit exponents in $D$ 3.
Deriving the compute- or energy-optimal "shape" for a given overall system budget.
Contrasting predictions for shape sensitivity in different problem domains (e.g., deep-wide preference in weather models vs. shape-insensitivity in LLMs).

Explicit shape-aware scaling laws enable model or system design choices that optimize performance or physical observables, given practical constraints.

2. Canonical Examples Across Domains

Neural Weather Models

Yu et al. (2026) derive:

$D$ 4

with exponents measured empirically for key weather models. Notably, at fixed parameter count $D$ 5, loss falls steeply with width ( $D$ 6) and weakly with depth ( $D$ 7), inverting the "shape-insensitivity" observed in LLMs and yielding the recommendation that compute-optimal weather models should be "as wide as possible, with depth as small as architectural constraints allow" (Yu et al., 26 Feb 2026).

Vision Transformers

Closed-form shape-aware scaling formulas are derived for each principal shape axis (width, depth, MLP size) in ViTs:

$D$ 8

Compute-optimal shapes allocate width, depth, and MLP dimension proportionally to $D$ 9, with empirically fitted $C$ 0, $C$ 1, $C$ 2 (Alabdulmohsin et al., 2023).

Condensation in Transport Processes

In stochastic transport with pair-factorized steady states (PFSS), condensate width $C$ 3 exhibits shape- and regime-dependent scaling exponents:

$C$ 4

where $C$ 5 and $C$ 6 control surface and site-energy (Ehrenpreis et al., 2015).

Martensitic Microstructures

For elastic two-well problems, the minimal energy $C$ 7 of a martensitic nucleus in an austenitic matrix displays eight shape-dependent scaling regimes (including logarithmic and power-law dependencies in system geometry, modulus ratios, and surface energy), with phase diagrams classifying which microstructure (uniform, laminate, branching) is optimal as aspect ratios and parameters vary (Conti et al., 2020).

Network Geometry and Empirical Systems

Structural network observables such as mean degree, clustering coefficient, mean geodesic distance, and assortativity exhibit domain-specific scaling exponents as functions of network size, directly linked to network "shape" features (degree heterogeneity, modularity, geometric embedding) (Dutta et al., 21 Mar 2026, Molkenthin et al., 2016).

Friction in Twisted Layered Interfaces

Static friction in finite polygonal or circular 2D flakes sliding on twisted hexagonal substrates admits distinct shape-dependent scaling:

Circular: $C$ 8 ( $C$ 9).
Polygonal: $s$ 0 is non-scaling (bounded, $s$ 1) for generic twist/edge orientations, but exhibits dual periodicity modulations at special alignments. Thus, macroscale superlubricity can be engineered via shape and orientation design (Yan et al., 2023).

3. Methodologies and Fitted Exponents

The detection and quantification of shape-aware scaling laws rely on systematic measurement or simulation over controlled shape sweeps:

For neural models, grid sweeps over $s$ 2 or higher-dimensional tuples, "star sweep" perturbations, and log-log regression provide exponents $s$ 3, $s$ 4, $s$ 5.
In physical and mathematical models, a combination of analytic asymptotics (envelope/saddle-point approximations), rigorous upper/lower bounds, and Monte Carlo or molecular dynamics simulations yield scaling regimes and phase boundaries in space spanned by key geometric or energetic parameters.

Representative exponents for major applications:

Domain	Shape Parameters	Exponents (sample)	Reference
Weather models	width ( $s$ 6), depth ( $s$ 7)	$s$ 8, $s$ 9	(Yu et al., 26 Feb 2026)
PFSS condensation	$w$ 0, $w$ 1	see explicit formulas above	(Ehrenpreis et al., 2015)
Premixed flames	$w$ 2, $w$ 3	CH $w$ 4: $w$ 5, γ ≈ 0.98	(Maffei et al., 16 Mar 2026)
Neural force fields	symmetry order, rep.	$w$ 6 and $w$ 7 up to 0.8	(Ngo et al., 10 Oct 2025)

4. Contrasts and Domain-specific Sensitivities

A key insight from comparative studies is that shape sensitivity is deeply domain-dependent:

LLMs: Shape-insensitivity dominates (varying width vs. depth at fixed parameters moves loss by $w$ 8), with compute-optimal $w$ 9 and $d$ 0 (Yu et al., 26 Feb 2026).
Weather models: Strong width preference; best practice is "shallow and wide," with a compute-optimal split prioritizing dataset size over parameter count, inverting the LLM prescription.
Frictional systems: For generic polygons and incommensurate twist, friction scaling exponent $d$ 1 (bounded friction), but specific edge-moiré alignments restore area scaling ( $d$ 2).
Martensitic phase transformations: Domain geometry selects between logarithmic and linear scaling in singular perturbation problems; only "highly compatible" polygons achieve the optimal (linear) scaling, whereas generic domains are penalized by $d$ 3 terms (Ginster et al., 2024).

5. Mechanistic and Design Implications

Shape-aware scaling laws underpin actionable recommendations and sharpen the design of computational, physical, and experimental systems:

Neural net design: Allocate parameter budget heavily to width for forecasting tasks; use minimal viable depth; scale data and model size in tandem for compute optimality (Yu et al., 26 Feb 2026, Alabdulmohsin et al., 2023, Ngo et al., 10 Oct 2025).
Physical systems: Engineer nucleation or pattern selection via geometric tailoring of domains to optimize energy scaling (e.g., nucleating martensitic variants in symmetry-adapted polygonal regions for sharp transitions).
Superlubricity via shape: Cut 2D materials into generic polygons (edge-moiré misaligned) to saturate friction at finite value, or introduce curvature to achieve $d$ 4 scaling, thereby ensuring macroscale superlubricity (Yan et al., 2023).
Network modeling: Validate generative models by matching empirically observed scaling exponents for network shape metrics; deviations signal missing mechanisms, such as degree heterogeneity or modularity (Dutta et al., 21 Mar 2026).

6. Extensions, Adaptive Strategies, and Open Directions

Beyond static prescriptions, adaptive and schedule-based shape optimization further exploits scaling-law envelopes. For example, dynamic rescheduling of model shape (width, patch size, context length) during training allows models to traverse piecewise-optimal regimes on the error-compute curve, yielding substantial compute savings and tighter performance guarantees over static architectures (Anagnostidis et al., 2023).

The interaction of shape-aware scaling with inductive biases remains a fertile territory, as demonstrated by the substantial increase in scaling exponents ( $d$ 5) when symmetry or equivariance constraints are imposed in neural potential models (Ngo et al., 10 Oct 2025). Similarly, mathematical theory continues to seek full classification of phase diagrams and shape-exponent transitions in high-dimensional and out-of-equilibrium systems.

In summary, shape-aware scaling laws unify disparate phenomena by illuminating the operational and physical significance of shape parameters, enabling precise optimization of performance or physical quantities across a wide spectrum of scientific and engineering domains.