Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning (1809.02121v3)

Published 6 Sep 2018 in cs.LG and stat.ML

Abstract: Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

Authors (5)

Tom Zahavy (41 papers)
Matan Haroush (4 papers)
Nadav Merlis (19 papers)
Daniel J. Mankowitz (28 papers)
Shie Mannor (228 papers)

Citations (180)

View on Semantic Scholar

Summary

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

This paper addresses the challenges posed by large discrete action spaces in Reinforcement Learning (RL), a significant barrier to the application of RL in complex domains such as NLP, industrial control systems, and text-based games. The authors introduce an innovative architecture known as the Action Elimination Deep Q Network (AE-DQN), which synergistically integrates traditional Deep Q-Networks (DQN) with a contextual bandit component to effectively eliminate non-optimal actions.

Core Contributions

The main novelty lies in the adoption of an action elimination mechanism. Instead of learning the Q-values for every possible action, the AE-DQN first focuses on identifying and discarding actions that are likely to be non-optimal. This methodology is predicated on the core assumption that it is simpler to predict invalid or inferior actions than it is to compute Q-values across the entire spectrum of possible actions. The paper establishes a theoretical foundation for this approach by applying linear contextual bandits for action elimination, thereby improving the robustness and efficiency of the learning process.

Simulation and Results

Simulations conducted in the environment of the text-based game "Zork" showcase the AE-DQN's effectiveness. The architecture exhibits notable speedup and increased robustness compared to vanilla DQN, particularly when faced with scenarios exceeding a thousand discrete possible actions. Numerical results also indicate that the AE-DQN advances in the game significantly faster by eliminating irrelevant actions, highlighting the practical advantages of the proposed approach in terms of both time efficiency and computational resource utilization.

Implications and Future Prospects

The implications of this research are twofold: firstly, it ameliorates the computational burden associated with large action spaces in RL applications, making it feasible to deploy RL in more intricate real-world scenarios; secondly, the paper suggests a potential direction for further research into the reduction of overestimation errors commonly encountered in DRL settings.

Looking forward, action elimination could remarkably refine RL strategies across various sectors, including conversational agents and automated planning systems. The paper hints at possible developments in utilizing this method to enhance NLP-based systems such as chatbots and personal assistants. Moreover, the exploration of action elimination as a method to reduce the dimensions of action space holds promise for solving large-scale problem domains like the full Zork game and applications in dynamically complex systems.

In conclusion, this research offers a compelling alternative to traditional DQN approaches, emphasizing the practical advantages of action elimination in handling extensive discrete action spaces, substantially paving the way for more efficient and scalable solutions in reinforcement learning applications.