Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When Representations Align: Universality in Representation Learning Dynamics (2402.09142v2)

Published 14 Feb 2024 in cs.LG and q-bio.NC

Abstract: Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that different architectures learn representations with striking qualitative similarities. Here we derive an effective theory of representation learning under the assumption that the encoding map from input to hidden representation and the decoding map from representation to output are arbitrary smooth functions. This theory schematizes representation learning dynamics in the regime of complex, large architectures, where hidden representations are not strongly constrained by the parametrization. We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the "rich" and "lazy" regime. While many network behaviors depend quantitatively on architecture, our findings point to certain behaviors that are widely conserved once models are sufficiently flexible.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. On Exact Computation with an Infinitely Wide Neural Net, November 2019. URL http://arxiv.org/abs/1904.11955. arXiv:1904.11955 [cs, stat].
  2. The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes, December 2022. URL https://arxiv.org/abs/2212.12147v1.
  3. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, 2(1):53–58, January 1989. ISSN 08936080. doi: 10.1016/0893-6080(89)90014-2. URL https://linkinghub.elsevier.com/retrieve/pii/0893608089900142.
  4. Barron, A. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, May 1993. ISSN 0018-9448, 1557-9654. doi: 10.1109/18.256500. URL https://ieeexplore.ieee.org/document/256500/.
  5. Grounding inductive biases in natural images:invariance stems from variations in data, November 2021. URL http://arxiv.org/abs/2106.05121. arXiv:2106.05121 [cs].
  6. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS computational biology, 19(4):e1011086, April 2023. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1011086.
  7. Exact learning dynamics of deep linear networks with prior knowledge. May 2022. URL https://openreview.net/forum?id=lJx2vng-KiC.
  8. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss. In Proceedings of Thirty Third Conference on Learning Theory, pp.  1305–1338. PMLR, July 2020. URL https://proceedings.mlr.press/v125/chizat20a.html. ISSN: 2640-3498.
  9. On Lazy Training in Differentiable Programming, January 2020. URL http://arxiv.org/abs/1812.07956. arXiv:1812.07956 [cs, math].
  10. What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines?, July 2023. URL https://www.biorxiv.org/content/10.1101/2022.03.28.485868v2. Pages: 2022.03.28.485868 Section: New Results.
  11. Csáji, B. Approximation with Artificial Neural Networks. PhD thesis, June 2001.
  12. Rich and lazy learning of task representations in brains and neural networks, April 2021. URL https://www.biorxiv.org/content/10.1101/2021.04.23.441128v1. Pages: 2021.04.23.441128 Section: New Results.
  13. Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron, 110(7):1258–1270.e11, April 2022. ISSN 08966273. doi: 10.1016/j.neuron.2022.01.005. URL https://linkinghub.elsevier.com/retrieve/pii/S0896627322000058.
  14. Modelling the influence of data structure on learning in neural networks: the hidden manifold model. Physical Review X, 10(4):041044, December 2020. ISSN 2160-3308. doi: 10.1103/PhysRevX.10.041044. URL http://arxiv.org/abs/1909.11500. arXiv:1909.11500 [cond-mat, stat].
  15. Neural Tangent Kernel: A Survey, August 2022. URL http://arxiv.org/abs/2208.13614. arXiv:2208.13614 [cs].
  16. Implicit Bias of Gradient Descent on Linear Convolutional Networks. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/hash/0e98aeeb54acf612b9eb4e48a269814c-Abstract.html.
  17. The Local Elasticity of Neural Networks, February 2020. URL http://arxiv.org/abs/1910.06943. arXiv:1910.06943 [cs, stat].
  18. Training Compute-Optimal Large Language Models, March 2022. URL http://arxiv.org/abs/2203.15556. arXiv:2203.15556 [cs].
  19. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, January 1989. ISSN 08936080. doi: 10.1016/0893-6080(89)90020-8. URL https://linkinghub.elsevier.com/retrieve/pii/0893608089900208.
  20. Semantic Relatedness Emerges in Deep Convolutional Neural Networks Designed for Object Recognition. Frontiers in Computational Neuroscience, 15, 2021. ISSN 1662-5188. URL https://www.frontiersin.org/articles/10.3389/fncom.2021.625804.
  21. Neural Tangent Kernel: Convergence and Generalization in Neural Networks, February 2020. URL http://arxiv.org/abs/1806.07572. arXiv:1806.07572 [cs, math, stat].
  22. Scaling Laws for Neural Language Models, January 2020. URL http://arxiv.org/abs/2001.08361. arXiv:2001.08361 [cs, stat].
  23. Kawaguchi, K. Deep Learning without Poor Local Minima, December 2016. URL http://arxiv.org/abs/1605.07110. arXiv:1605.07110 [cs, math, stat].
  24. An analytic theory of generalization dynamics and transfer learning in deep linear networks, January 2019. URL http://arxiv.org/abs/1809.10374. arXiv:1809.10374 [cs, stat].
  25. Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://papers.nips.cc/paper/2019/hash/0d1a9651497a38d8b1c3871c84528bd4-Abstract.html.
  26. Universality and individuality in neural dynamics across large populations of recurrent networks, December 2019. URL http://arxiv.org/abs/1907.08549. arXiv:1907.08549 [cs, q-bio].
  27. Learning rule influences recurrent network representations but not attractor structure in decision-making tasks. In Advances in Neural Information Processing Systems, volume 34, pp.  21972–21983. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/b87039703fe79778e9f140b78621d7fb-Abstract.html.
  28. A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33):E7665–E7671, August 2018. doi: 10.1073/pnas.1806579115. URL https://www.pnas.org/doi/10.1073/pnas.1806579115. Publisher: Proceedings of the National Academy of Sciences.
  29. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning, April 2015. URL http://arxiv.org/abs/1412.6614. arXiv:1412.6614 [cs, stat].
  30. Saddle-to-Saddle Dynamics in Diagonal Linear Networks, October 2023. URL http://arxiv.org/abs/2304.00488. arXiv:2304.00488 [cs, math].
  31. Exact Solution for On-Line Learning in Multilayer Neural Networks. Physical Review Letters, 74(21):4337–4340, May 1995. ISSN 0031-9007, 1079-7114. doi: 10.1103/PhysRevLett.74.4337. URL https://link.aps.org/doi/10.1103/PhysRevLett.74.4337.
  32. If deep learning is the answer, what is the question? Nature Reviews Neuroscience, 22(1):55–67, January 2021. ISSN 1471-0048. doi: 10.1038/s41583-020-00395-8. URL https://www.nature.com/articles/s41583-020-00395-8. Number: 1 Publisher: Nature Publishing Group.
  33. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, February 2014. URL http://arxiv.org/abs/1312.6120. arXiv:1312.6120 [cond-mat, q-bio, stat].
  34. The Neural Race Reduction: Dynamics of Abstraction in Gated Networks, July 2022. URL http://arxiv.org/abs/2207.10430. arXiv:2207.10430 [cs].
  35. Statistical mechanics of learning from examples. Physical Review A, 45(8):6056–6091, April 1992. ISSN 1050-2947, 1094-1622. doi: 10.1103/PhysRevA.45.6056. URL https://link.aps.org/doi/10.1103/PhysRevA.45.6056.
  36. The Implicit Bias of Gradient Descent on Separable Data, July 2022. URL http://arxiv.org/abs/1710.10345. arXiv:1710.10345 [cs, stat].
  37. Kernel and Rich Regimes in Overparametrized Models, July 2020. URL http://arxiv.org/abs/2002.09277. arXiv:2002.09277 [cs, stat].
  38. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3):356–365, March 2016. ISSN 1546-1726. doi: 10.1038/nn.4244. URL https://www.nature.com/articles/nn.4244. Number: 3 Publisher: Nature Publishing Group.
  39. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, June 2014. doi: 10.1073/pnas.1403112111. URL https://www.pnas.org/doi/10.1073/pnas.1403112111. Publisher: Proceedings of the National Academy of Sciences.
  40. Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations, October 2021. URL http://arxiv.org/abs/2110.05960. arXiv:2110.05960 [cs, stat].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Loek van Rossem (1 paper)
  2. Andrew M. Saxe (24 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.