Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task structure and nonlinearity jointly determine learned representational geometry (2401.13558v1)

Published 24 Jan 2024 in cs.LG

Abstract: The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Neural networks as kernel learners: The silent alignment effect. arXiv preprint arXiv:2111.00034, 2021.
  2. Implicit regularization via neural feature alignment. In International Conference on Artificial Intelligence and Statistics, pp.  2269–2277. PMLR, 2021.
  3. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell, 183(4):954–967, 2020.
  4. The geometry of hippocampal ca2 representations enables abstract coding of social familiarity and identity. bioRxiv and in press in Neuron, pp.  2022–01, 2022.
  5. Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Conference on Learning Theory, pp.  1305–1338. PMLR, 2020.
  6. On kernel-target alignment. Advances in neural information processing systems, 14, 2001.
  7. Activation functions and their characteristics in deep neural networks. In 2018 Chinese control and decision conference (CCDC), pp.  1836–1841. IEEE, 2018.
  8. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. Advances in Neural Information Processing Systems, 33:5850–5861, 2020.
  9. On the impact of the activation function on deep neural networks training. In International conference on machine learning, pp.  2672–2680. PMLR, 2019.
  10. Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230, 2018.
  11. Generalized neural collapse for a large number of classes. arXiv preprint arXiv:2310.05351, 2023.
  12. Abstract representations emerge naturally in neural networks trained to perform multiple tasks. Nature Communications, 14(1):1040, 2023.
  13. Semi-orthogonal subspaces for value mediate a tradeoff between binding and generalization. arXiv preprint arXiv:2309.07766, 2023.
  14. Similarity of neural network representations revisited. In International Conference on Machine Learning, pp.  3519–3529. PMLR, 2019.
  15. Neural collapse: A review on modelling principles and generalization. arXiv preprint arXiv:2206.04041, 2022.
  16. Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, 2:4, 2008.
  17. Factorized visual representations in the primate visual system and deep neural networks. bioRxiv, pp.  2023–04, 2023.
  18. On the principles of parsimony and self-consistency for the emergence of intelligence. Frontiers of Information Technology & Electronic Engineering, 23(9):1298–1323, 2022.
  19. What is being transferred in transfer learning? Advances in neural information processing systems, 33:512–523, 2020.
  20. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020.
  21. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
  22. Shallow univariate relu networks as splines: initialization, loss surface, hessian, and gradient flow dynamics. Frontiers in artificial intelligence, 5:889981, 2022.
  23. A theory of neural tangent kernel alignment and its influence on training. arXiv preprint arXiv:2105.14301, 2021.
  24. The geometry of concept learning. BioRxiv, pp.  2021–03, 2021.
  25. Limitations of the ntk for understanding generalization in deep learning. arXiv preprint arXiv:2206.10012, 2022.
  26. Improving vaes’ robustness to adversarial attack. arXiv preprint arXiv:1906.00230, 2019.
  27. Feature learning in infinite-width neural networks. arXiv preprint arXiv:2011.14522, 2020.
  28. A geometric analysis of neural collapse with unconstrained features. Advances in Neural Information Processing Systems, 34:29820–29834, 2021.
Citations (6)

Summary

  • The paper shows that activation functions (Tanh and ReLU) produce distinct internal representations that either align with target outputs or retain input structures.
  • It applies analytical inspection of weight dynamics to reveal how Tanh integrates label structure holistically while ReLU fosters specialized, region-specific features.
  • The findings guide neural network design by suggesting Tanh for abstract, disentangled representations in low-dimensional tasks and ReLU for preserving detailed input information.

Introduction

Neural networks, as a cornerstone of modern artificial intelligence, exhibit an intriguing ability to develop internal representational geometries that are crucial for the successful performance of various tasks. These geometries are formed through the intricate interplay of network architecture, the structure of input data, and the nature of output targets that networks are trained to produce. The paper at hand takes a closer look at single-hidden-layer networks, elucidating how the choice of activation function shapes these internal representations within the network, specifically contrasting the Tanh and ReLU activation functions.

Impact of Activation Function on Representational Geometry

A core discovery from the analysis indicates a strong correlation between the type of nonlinearity introduced by the activation function and the resulting geometry of the learned representations. Networks utilizing the Tanh activation function develop internal geometries that closely mirror the structure of the target outputs, highlighting a trend toward generating more abstract, disentangled representations when the outputs are low-dimensional. Conversely, networks employing the ReLU activation function tend to preserve information regarding the structure of the raw inputs, leading to representations that are less abstract but potentially more versatile for a broader range of applications.

Analytical Approach to Understanding Learning Dynamics

An in-depth inspection of the learning dynamics, specifically within the weight space, provides an explanation as to why these discrepancies arise in networks with different nonlinearities. This behavior is accredited to the asymmetric nature of the ReLU function, which leads to certain feature neurons becoming specialized for various regions of the input space. In contrast, neurons with the Tanh nonlinearity tend to integrate the label structure more holistically into the network's representation.

Impact of Task Input-Output Structure

The influence of the input and output structures of tasks further underscores the complexities inherent in representation learning. When the target outputs are linearly separable, Tanh networks largely disregard the structure of inputs, opting instead to learn representations aligned with the target outputs. ReLU networks, however, maintain a stronger semblance of input structure across different task geometries. This nuanced interaction between input geometry, output label geometry, network architecture, and nonlinearity brings to light the crucial balance between adaptability and precision in representational learning.

Theoretical and Practical Implications

The paper offers a comprehensive framework for assessing the formation of neural representations, empirically demonstrating the considerable effect of activation functions on the representational geometry. These insights have extensive implications for network design, particularly in guiding the choice of an optimal nonlinearity based upon the nature and demands of the task. In scenarios where generalization to new tasks is a priority, the results suggest that a disentangled representation such as that provided by the Tanh nonlinearity may be beneficial. Conversely, when tasks necessitate retention of more detailed input information, the ReLU's propensity to specialize may offer advantages. These findings serve as a strategic guide for constructing neural architectures tailored to an expansive array of applications, from transfer learning to enhancing adversarial robustness.