Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning
The paper "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning" by Stefan Elfwing, Eiji Uchibe, and Kenji Doya introduces two activation functions—Sigmoid-Weighted Linear Units (SiLU) and their derivatives (dSiLU)—designed to enhance the performance of neural networks in reinforcement learning tasks. Additionally, the authors argue for a more traditional on-policy learning approach using eligibility traces and softmax action selection, aiming to present it as a competitive alternative to the Deep Q-Network (DQN) methodology.
Neural Network Function Approximation
The paper puts forward SiLU and dSiLU as novel activation functions. The SiLU is defined as the product of an input and its sigmoid, whereas the dSiLU stands as its derivative. Compared to the commonly used ReLU and sigmoid units, experiments show that SiLUs notably outperform ReLUs and dSiLUs exceed sigmoid units in several reinforcement learning scenarios. Notably, the SiLU maintains a form similar to a "soft floor," clamp mitigating large magnitudes and thus providing implicit regularization during learning.
Comparison with Traditional Methods
The paper suggests reverting to a more traditional on-policy learning model, incorporating eligibility traces and softmax action selection. The motivation for this stems from the historical success of Tesauro's TD-Gammon and aims to highlight that such methods can still be competitive without needing a separate target network as used in DQN.
Empirical Results
SZ-Tetris
In the domain of stochastic SZ-Tetris, a simplified yet challenging version of Tetris, the researchers tested shallow networks with different hidden units. The dSiLU network delivered an average score of 263, significantly outperforming the existing state-of-the-art score by 20%. This key result underscores the efficacy of dSiLU in this domain.
When deep networks were tested using raw board configurations, the SiLU-dSiLU architecture outperformed other configurations, achieving a score of 229, further establishing the robustness of SiLUs and dSiLUs in reinforcement learning tasks.
10x10 Tetris
To further validate their activation functions, the authors applied them to a smaller 10x10 version of Tetris, where learning time is manageable. They achieved a state-of-the-art average score of 4,900 points, notable since past methods, even those utilizing sophisticated feature sets, managed only scores in the range of 3,000 to 4,200 points.
Atari 2600 Domain
To test applicability in high-dimensional state spaces, the authors applied their model to 12 Atari 2600 games. The SiLU-dSiLU agents outperformed DQN and its improvements (Gorila DQN and double DQN), particularly excelling in games like Asterix and Asteroids. While double DQN had achieved normalized mean and median scores of 127% and 105% respectively, the proposed SiLU-dSiLU agents achieved 332% and 125%, respectively. This performance reinforces the potential of these new activation functions along with the traditional on-policy learning approach.
Analysis of Value Estimation and Action Selection
The experiments evaluated how well TD(λ) and Sarsa(λ) can estimate expected discounted returns, reaffirming the algorithms' robustness in avoiding the overestimation of action values—a known issue in Q-learning algorithms. Furthermore, the use of softmax action selection proved advantageous over ε-greedy selection, especially in games where random exploratory moves can have disproportionately negative consequences.
Implications and Future Directions
The findings present compelling evidence that SiLUs and dSiLUs can significantly enhance the performance of neural network-based reinforcement learning agents. The success of traditional methods augmented with these novel activation functions also suggests a promising direction for reinforcement learning. Future work might explore hybrid methodologies that combine the on-policy learning advantages demonstrated with the computational efficiencies and stability of separate target networks, dueling architectures, or asynchronous learning frameworks.
By investigating such directions, further refinements in reinforcement learning algorithms can be expected, pushing the boundaries of what these models can achieve in both domain-specific applications and more generalized AI tasks.