Task structure and nonlinearity jointly determine learned representational geometry (2401.13558v1)
Abstract: The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
- Neural networks as kernel learners: The silent alignment effect. arXiv preprint arXiv:2111.00034, 2021.
- Implicit regularization via neural feature alignment. In International Conference on Artificial Intelligence and Statistics, pp. 2269–2277. PMLR, 2021.
- The geometry of abstraction in the hippocampus and prefrontal cortex. Cell, 183(4):954–967, 2020.
- The geometry of hippocampal ca2 representations enables abstract coding of social familiarity and identity. bioRxiv and in press in Neuron, pp. 2022–01, 2022.
- Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Conference on Learning Theory, pp. 1305–1338. PMLR, 2020.
- On kernel-target alignment. Advances in neural information processing systems, 14, 2001.
- Activation functions and their characteristics in deep neural networks. In 2018 Chinese control and decision conference (CCDC), pp. 1836–1841. IEEE, 2018.
- Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. Advances in Neural Information Processing Systems, 33:5850–5861, 2020.
- On the impact of the activation function on deep neural networks training. In International conference on machine learning, pp. 2672–2680. PMLR, 2019.
- Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230, 2018.
- Generalized neural collapse for a large number of classes. arXiv preprint arXiv:2310.05351, 2023.
- Abstract representations emerge naturally in neural networks trained to perform multiple tasks. Nature Communications, 14(1):1040, 2023.
- Semi-orthogonal subspaces for value mediate a tradeoff between binding and generalization. arXiv preprint arXiv:2309.07766, 2023.
- Similarity of neural network representations revisited. In International Conference on Machine Learning, pp. 3519–3529. PMLR, 2019.
- Neural collapse: A review on modelling principles and generalization. arXiv preprint arXiv:2206.04041, 2022.
- Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, 2:4, 2008.
- Factorized visual representations in the primate visual system and deep neural networks. bioRxiv, pp. 2023–04, 2023.
- On the principles of parsimony and self-consistency for the emergence of intelligence. Frontiers of Information Technology & Electronic Engineering, 23(9):1298–1323, 2022.
- What is being transferred in transfer learning? Advances in neural information processing systems, 33:512–523, 2020.
- Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020.
- Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
- Shallow univariate relu networks as splines: initialization, loss surface, hessian, and gradient flow dynamics. Frontiers in artificial intelligence, 5:889981, 2022.
- A theory of neural tangent kernel alignment and its influence on training. arXiv preprint arXiv:2105.14301, 2021.
- The geometry of concept learning. BioRxiv, pp. 2021–03, 2021.
- Limitations of the ntk for understanding generalization in deep learning. arXiv preprint arXiv:2206.10012, 2022.
- Improving vaes’ robustness to adversarial attack. arXiv preprint arXiv:1906.00230, 2019.
- Feature learning in infinite-width neural networks. arXiv preprint arXiv:2011.14522, 2020.
- A geometric analysis of neural collapse with unconstrained features. Advances in Neural Information Processing Systems, 34:29820–29834, 2021.