Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning (1702.03118v3)

Published 10 Feb 2017 in cs.LG

Abstract: In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection with simple annealing can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10$\times$10 board, using TD($\lambda$) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa($\lambda$) agent with SiLU and dSiLU hidden units.

PDF Abstract

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

The paper "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning" by Stefan Elfwing, Eiji Uchibe, and Kenji Doya introduces two activation functions—Sigmoid-Weighted Linear Units (SiLU) and their derivatives (dSiLU)—designed to enhance the performance of neural networks in reinforcement learning tasks. Additionally, the authors argue for a more traditional on-policy learning approach using eligibility traces and softmax action selection, aiming to present it as a competitive alternative to the Deep Q-Network (DQN) methodology.

Neural Network Function Approximation

The paper puts forward SiLU and dSiLU as novel activation functions. The SiLU is defined as the product of an input and its sigmoid, whereas the dSiLU stands as its derivative. Compared to the commonly used ReLU and sigmoid units, experiments show that SiLUs notably outperform ReLUs and dSiLUs exceed sigmoid units in several reinforcement learning scenarios. Notably, the SiLU maintains a form similar to a "soft floor," clamp mitigating large magnitudes and thus providing implicit regularization during learning.

Comparison with Traditional Methods

The paper suggests reverting to a more traditional on-policy learning model, incorporating eligibility traces and softmax action selection. The motivation for this stems from the historical success of Tesauro's TD-Gammon and aims to highlight that such methods can still be competitive without needing a separate target network as used in DQN.

Empirical Results

SZ-Tetris

In the domain of stochastic SZ-Tetris, a simplified yet challenging version of Tetris, the researchers tested shallow networks with different hidden units. The dSiLU network delivered an average score of 263, significantly outperforming the existing state-of-the-art score by 20%. This key result underscores the efficacy of dSiLU in this domain.

When deep networks were tested using raw board configurations, the SiLU-dSiLU architecture outperformed other configurations, achieving a score of 229, further establishing the robustness of SiLUs and dSiLUs in reinforcement learning tasks.

10x10 Tetris

To further validate their activation functions, the authors applied them to a smaller 10x10 version of Tetris, where learning time is manageable. They achieved a state-of-the-art average score of 4,900 points, notable since past methods, even those utilizing sophisticated feature sets, managed only scores in the range of 3,000 to 4,200 points.

Atari 2600 Domain

To test applicability in high-dimensional state spaces, the authors applied their model to 12 Atari 2600 games. The SiLU-dSiLU agents outperformed DQN and its improvements (Gorila DQN and double DQN), particularly excelling in games like Asterix and Asteroids. While double DQN had achieved normalized mean and median scores of 127% and 105% respectively, the proposed SiLU-dSiLU agents achieved 332% and 125%, respectively. This performance reinforces the potential of these new activation functions along with the traditional on-policy learning approach.

Analysis of Value Estimation and Action Selection

The experiments evaluated how well TD(λ) and Sarsa(λ) can estimate expected discounted returns, reaffirming the algorithms' robustness in avoiding the overestimation of action values—a known issue in Q-learning algorithms. Furthermore, the use of softmax action selection proved advantageous over ε-greedy selection, especially in games where random exploratory moves can have disproportionately negative consequences.

Implications and Future Directions

The findings present compelling evidence that SiLUs and dSiLUs can significantly enhance the performance of neural network-based reinforcement learning agents. The success of traditional methods augmented with these novel activation functions also suggests a promising direction for reinforcement learning. Future work might explore hybrid methodologies that combine the on-policy learning advantages demonstrated with the computational efficiencies and stability of separate target networks, dueling architectures, or asynchronous learning frameworks.

By investigating such directions, further refinements in reinforcement learning algorithms can be expected, pushing the boundaries of what these models can achieve in both domain-specific applications and more generalized AI tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Stefan Elfwing (3 papers)
Eiji Uchibe (14 papers)
Kenji Doya (21 papers)

Citations (1,403)

View on Semantic Scholar

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning (1702.03118v3)