Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 136 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 189 tok/s Pro
GPT OSS 120B 427 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff (2305.19008v3)

Published 30 May 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Previous work has shown that DNNs with large depth $L$ and $L_{2}$-regularization are biased towards learning low-dimensional representations of the inputs, which can be interpreted as minimizing a notion of rank $R{(0)}(f)$ of the learned function $f$, conjectured to be the Bottleneck rank. We compute finite depth corrections to this result, revealing a measure $R{(1)}$ of regularity which bounds the pseudo-determinant of the Jacobian $\left|Jf(x)\right|{+}$ and is subadditive under composition and addition. This formalizes a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps, allowing the network to learn the `right' inner dimension. Finally, we prove the conjectured bottleneck structure in the learned features as $L\to\infty$: for large depths, almost all hidden representations are approximately $R{(0)}(f)$-dimensional, and almost all weight matrices $W{\ell}$ have $R{(0)}(f)$ singular values close to 1 while the others are $O(L{-\frac{1}{2}})$. Interestingly, the use of large learning rates is required to guarantee an order $O(L)$ NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Understanding deep neural networks with rectified linear units. In International Conference on Learning Representations, 2018.
  2. High-dimensional limit theorems for SGD: Effective dynamics and critical scaling. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  3. Francis Bach. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research, 18(1):629–681, 2017.
  4. The shortest path through many points. Mathematical Proceedings of the Cambridge Philosophical Society, 55(4):299–327, 1959.
  5. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 1305–1338. PMLR, 09–12 Jul 2020.
  6. Representation costs of linear neural networks: Analysis and design. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  7. Label noise sgd provably prefers flat global minimizers. Advances in Neural Information Processing Systems, 34:27449–27461, 2021.
  8. Characterizing implicit bias in terms of optimization geometry. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1832–1841. PMLR, 10–15 Jul 2018.
  9. Relu deep neural networks and linear finite elements. arXiv preprint arXiv:1807.03973, 2018.
  10. Arthur Jacot. Implicit bias of large depth networks: a notion of rank for nonlinear functions. In The Eleventh International Conference on Learning Representations, 2023.
  11. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. In Advances in Neural Information Processing Systems 31, pages 8580–8589. Curran Associates, Inc., 2018.
  12. The asymptotic spectrum of the hessian of dnn throughout training. In International Conference on Learning Representations, 2020.
  13. Feature learning in l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-regularized dnns: Attraction/repulsion and sparsity. In Advances in Neural Information Processing Systems, volume 36, 2022.
  14. Training invariances and the low-rank phenomenon: beyond linear networks. In International Conference on Learning Representations, 2022.
  15. What happens after sgd reaches zero loss?–a mathematical framework. arXiv preprint arXiv:2110.06914, 2021.
  16. The role of linear layers in nonlinear interpolating networks. arXiv preprint arXiv:2202.00856, 2022.
  17. Houman Owhadi. Do ideas have shape? plato’s theory of forms as the continuous limit of artificial neural networks. arXiv preprint arXiv:2008.03920, 2020.
  18. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
  19. Implicit regularization towards rank minimization in relu networks. arXiv preprint arXiv:2201.12760, 2022.
  20. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015.
  21. Implicit bias of sgd in l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-regularized linear dnns: One-way jumps from high to low rank, 2023.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: