Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff (2305.19008v3)

Published 30 May 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Previous work has shown that DNNs with large depth $L$ and $L_{2}$-regularization are biased towards learning low-dimensional representations of the inputs, which can be interpreted as minimizing a notion of rank $R^{(0)}(f)$ of the learned function $f$, conjectured to be the Bottleneck rank. We compute finite depth corrections to this result, revealing a measure $R^{(1)}$ of regularity which bounds the pseudo-determinant of the Jacobian $\left|Jf(x)\right|{+}$ and is subadditive under composition and addition. This formalizes a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps, allowing the network to learn the `right' inner dimension. Finally, we prove the conjectured bottleneck structure in the learned features as $L\to\infty$: for large depths, almost all hidden representations are approximately $R^{{(0)}(f)$-dimensional,} and almost all weight matrices $W{\ell}$ have $R^{(0)}(f)$ singular values close to 1 while the others are $O(L^{{-\frac{1}{2}})$.} Interestingly, the use of large learning rates is required to guarantee an order $O(L)$ NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.