Justification of mean-field abc-parameterization via non-trivial stability constraints
Establish that the specific abc-parameterization choices used to realize the mean-field regime in wide fully connected networks—namely width-independent parameterization of layer weights W_ℓ = M^{-a_ℓ} w^ℓ, Gaussian initialization w^ℓ_{αβ} ∼ N(0, M^{-2 b_ℓ}), and an SGD learning rate scaled as η M^{-c} such that the output-layer 1/M scaling is canceled—indeed satisfy the non-trivial stable abc-parameterization constraints a_{L+1}+b_{L+1}+r=1 or 2 a_{L+1}+c=1 (equation (constr_poly) in Section 4.3), thereby placing the infinite-width dynamics in the mean-field feature-learning regime.
References
Although not mentioned in the paper, we conjecture the reason for these manipulations is to put the parameterizations in the MF regime to satisfy the constraints~constr_poly.