Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 124 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 206 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets (2405.17573v2)

Published 27 May 2024 in stat.ML, cs.AI, and cs.LG

Abstract: We study Leaky ResNets, which interpolate between ResNets and Fully-Connected nets depending on an 'effective depth' hyper-parameter $\tilde{L}$. In the infinite depth limit, we study 'representation geodesics' $A_{p}$: continuous paths in representation space (similar to NeuralODEs) from input $p=0$ to output $p=1$ that minimize the parameter norm of the network. We give a Lagrangian and Hamiltonian reformulation, which highlight the importance of two terms: a kinetic energy which favors small layer derivatives $\partial_{p}A_{p}$ and a potential energy that favors low-dimensional representations, as measured by the 'Cost of Identity'. The balance between these two forces offers an intuitive understanding of feature learning in ResNets. We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work: for large $\tilde{L}$ the potential energy dominates and leads to a separation of timescales, where the representation jumps rapidly from the high dimensional inputs to a low-dimensional representation, move slowly inside the space of low-dimensional representations, before jumping back to the potentially high-dimensional outputs. Inspired by this phenomenon, we train with an adaptive layer step-size to adapt to the separation of timescales.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. The staircase property: How hierarchical structure can guide deep learning. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=fj6rFciApc.
  2. The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks. In Conference on Learning Theory, pages 4782–4887. PMLR, 2022.
  3. A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4:385–399, 2016.
  4. Francis Bach. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research, 18(1):629–681, 2017.
  5. Mechanism of feature learning in convolutional neural networks. arXiv preprint arXiv:2309.00570, 2023.
  6. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  7. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 1305–1338. PMLR, 09–12 Jul 2020. URL http://proceedings.mlr.press/v125/chizat20a.html.
  8. Towards understanding linear word analogies. arXiv preprint arXiv:1810.04882, 2018.
  9. Sgd and weight decay provably induce a low-rank bias in neural networks. arXiv preprint arXiv:2206.05794, 2022.
  10. Numerical computation of rare events via large deviation theory. Chaos: An Interdisciplinary Journal of Nonlinear Science, 29(6):063118, 06 2019. ISSN 1054-1500. doi: 10.1063/1.5084025. URL https://doi.org/10.1063/1.5084025.
  11. Arclength parametrized hamilton’s equations for the calculation of instantons. Multiscale Modeling & Simulation, 12(2):566–580, 2014.
  12. Characterizing implicit bias in terms of optimization geometry. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1832–1841. PMLR, 10–15 Jul 2018a. URL http://proceedings.mlr.press/v80/gunasekar18a.html.
  13. Implicit bias of gradient descent on linear convolutional networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018b. URL https://proceedings.neurips.cc/paper/2018/file/0e98aeeb54acf612b9eb4e48a269814c-Paper.pdf.
  14. A rainbow in deep network black boxes. arXiv preprint arXiv:2305.18512, 2023.
  15. Arthur Jacot. Implicit bias of large depth networks: a notion of rank for nonlinear functions. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=6iDHce-0B-a.
  16. Arthur Jacot. Bottleneck structure in learned features: Low-dimension vs regularity tradeoff. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 23607–23629. Curran Associates, Inc., 2023b. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/4a6695df88f2de0d49f875189ea181ef-Paper-Conference.pdf.
  17. Feature learning in l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-regularized dnns: Attraction/repulsion and sparsity. In Advances in Neural Information Processing Systems, volume 36, 2022.
  18. Similarity of neural network representations revisited. In International Conference on Machine Learning, pages 3519–3529. PMLR, 2019.
  19. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012.
  20. Residual alignment: Uncovering the mechanisms of residual networks. Advances in Neural Information Processing Systems, 36, 2024.
  21. Towards resolving the implicit bias of gradient descent for matrix factorization: Greedy low-rank learning. In International Conference on Learning Representations, 2020.
  22. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  23. Houman Owhadi. Do ideas have shape? plato’s theory of forms as the continuous limit of artificial neural networks. arXiv preprint arXiv:2008.03920, 2020.
  24. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020.
  25. Mechanism for feature learning in neural networks and backpropagation-free machine learning models. Science, 383(6690):1461–1467, 2024.
  26. On the information bottleneck theory of deep learning. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ry_WPG-A-.
  27. Deep neural collapse is provably optimal for the deep unconstrained features model. Advances in Neural Information Processing Systems, 36, 2024.
  28. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015.
  29. Implicit bias of SGD in l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-regularized linear DNNs: One-way jumps from high to low rank. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=P1aobHnjjj.
  30. Which frequencies do cnns need? emergent bottleneck structure in feature learning. to appear at ICML, 2024.
  31. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 30 likes.

Upgrade to Pro to view all of the tweets about this paper: