Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks (2507.07675v1)
Abstract: We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of $m$ inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the \$m\times n\$ hidden activation matrices. Our main result shows that $\mathbb{E}[EDim(\ell)]=m[1-(1-2/\pi)\ell]+O(e{-c m})$ so that the rank deficit decays geometrically with ratio $1-2 / \pi \approx 0.3634$. We also prove a sub-Gaussian concentration bound, and identify the "revival" depths at which the expected rank attains local maxima. In particular, these peaks occur at depths $\ell_k*\approx(k+1/2)\pi/\log(1/\rho)$ with height $\approx (1-e{-\pi/2}) m \approx 0.79m$. We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope leaky-ReLU, the rank remains (nearly) full. These results provide a precise characterization of how random ReLU layers alternately collapse and partially revive the subspace of input variations, adding nuance to prior work on expressivity of deep networks.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.