Hadamard Representations: Augmenting Hyperbolic Tangents in RL (2406.09079v3)

Published 13 Jun 2024 in cs.LG

Abstract: Activation functions are one of the key components of a deep neural network. The most commonly used activation functions can be classed into the category of continuously differentiable (e.g. tanh) and piece-wise linear functions (e.g. ReLU), both having their own strengths and drawbacks with respect to downstream performance and representation capacity through learning (e.g. measured by the number of dead neurons and the effective rank). In reinforcement learning, the performance of continuously differentiable activations often falls short as compared to piece-wise linear functions. We provide insights into the vanishing gradients associated with the former, and show that the dying neuron problem is not exclusive to ReLU's. To alleviate vanishing gradients and the resulting dying neuron problem occurring with continuously differentiable activations, we propose a Hadamard representation. Using deep Q-networks, proximal policy optimization and parallelized Q-networks in the Atari domain, we show faster learning, a reduction in dead neurons and increased effective rank.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Hadamard Representations: Augmenting Hyperbolic Tangents in RL (2406.09079v3)

Summary

Related Papers

Tweets