Dice Question Streamline Icon: https://streamlinehq.com

Justification of mean-field abc-parameterization via non-trivial stability constraints

Establish that the specific abc-parameterization choices used to realize the mean-field regime in wide fully connected networks—namely width-independent parameterization of layer weights W_ℓ = M^{-a_ℓ} w^ℓ, Gaussian initialization w^ℓ_{αβ} ∼ N(0, M^{-2 b_ℓ}), and an SGD learning rate scaled as η M^{-c} such that the output-layer 1/M scaling is canceled—indeed satisfy the non-trivial stable abc-parameterization constraints a_{L+1}+b_{L+1}+r=1 or 2 a_{L+1}+c=1 (equation (constr_poly) in Section 4.3), thereby placing the infinite-width dynamics in the mean-field feature-learning regime.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper reviews the abc-parameterization framework for scaling weights and learning rates in overparameterized fully connected networks, which unifies Neural Tangent Kernel (NTK) and Mean-Field (MF) regimes. In this framework, stable training dynamics at infinite width are characterized by a polyhedral set of linear constraints on (a_ℓ, b_ℓ) and c. Two faces of this polyhedron, given by a_{L+1}+b_{L+1}+r=1 or 2a_{L+1}+c=1, define the non-trivial stable regimes corresponding to MF (feature learning) and NTK (kernel) behaviors.

The survey notes that in MF analyses the output-layer 1/M scaling is canceled by an M-scaled learning rate and that weight initializations are width-independent. It conjectures that these manipulations are precisely to meet the non-trivial stability constraints, but this is not proven in the cited works. Formalizing and verifying this connection would clarify why particular scaling choices guarantee MF dynamics in the infinite-width limit.

References

Although not mentioned in the paper, we conjecture the reason for these manipulations is to put the parameterizations in the MF regime to satisfy the constraints~constr_poly.

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models (2401.07187 - Suh et al., 14 Jan 2024) in Section 4.3, Unifying View of NTK and Mean-Field Regime