Task structure and nonlinearity jointly determine learned representational geometry (2401.13558v1)

Published 24 Jan 2024 in cs.LG

Abstract: The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

References (28)

Citations (6)

View on Semantic Scholar

Summary

The paper shows that activation functions (Tanh and ReLU) produce distinct internal representations that either align with target outputs or retain input structures.
It applies analytical inspection of weight dynamics to reveal how Tanh integrates label structure holistically while ReLU fosters specialized, region-specific features.
The findings guide neural network design by suggesting Tanh for abstract, disentangled representations in low-dimensional tasks and ReLU for preserving detailed input information.

Introduction

Neural networks, as a cornerstone of modern artificial intelligence, exhibit an intriguing ability to develop internal representational geometries that are crucial for the successful performance of various tasks. These geometries are formed through the intricate interplay of network architecture, the structure of input data, and the nature of output targets that networks are trained to produce. The paper at hand takes a closer look at single-hidden-layer networks, elucidating how the choice of activation function shapes these internal representations within the network, specifically contrasting the Tanh and ReLU activation functions.

Impact of Activation Function on Representational Geometry

A core discovery from the analysis indicates a strong correlation between the type of nonlinearity introduced by the activation function and the resulting geometry of the learned representations. Networks utilizing the Tanh activation function develop internal geometries that closely mirror the structure of the target outputs, highlighting a trend toward generating more abstract, disentangled representations when the outputs are low-dimensional. Conversely, networks employing the ReLU activation function tend to preserve information regarding the structure of the raw inputs, leading to representations that are less abstract but potentially more versatile for a broader range of applications.

Analytical Approach to Understanding Learning Dynamics

An in-depth inspection of the learning dynamics, specifically within the weight space, provides an explanation as to why these discrepancies arise in networks with different nonlinearities. This behavior is accredited to the asymmetric nature of the ReLU function, which leads to certain feature neurons becoming specialized for various regions of the input space. In contrast, neurons with the Tanh nonlinearity tend to integrate the label structure more holistically into the network's representation.

Impact of Task Input-Output Structure

The influence of the input and output structures of tasks further underscores the complexities inherent in representation learning. When the target outputs are linearly separable, Tanh networks largely disregard the structure of inputs, opting instead to learn representations aligned with the target outputs. ReLU networks, however, maintain a stronger semblance of input structure across different task geometries. This nuanced interaction between input geometry, output label geometry, network architecture, and nonlinearity brings to light the crucial balance between adaptability and precision in representational learning.

Theoretical and Practical Implications

The paper offers a comprehensive framework for assessing the formation of neural representations, empirically demonstrating the considerable effect of activation functions on the representational geometry. These insights have extensive implications for network design, particularly in guiding the choice of an optimal nonlinearity based upon the nature and demands of the task. In scenarios where generalization to new tasks is a priority, the results suggest that a disentangled representation such as that provided by the Tanh nonlinearity may be beneficial. Conversely, when tasks necessitate retention of more detailed input information, the ReLU's propensity to specialize may offer advantages. These findings serve as a strategic guide for constructing neural architectures tailored to an expansive array of applications, from transfer learning to enhancing adversarial robustness.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Jack_W_Lindsey/status/1750677094827237512

https://twitter.com/StefanoFusi2/status/1823755276484534323

https://twitter.com/fly51fly/status/1751018538641641510

https://twitter.com/arxivsanitybot/status/1750872865011097957