Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space (1810.06394v1)

Published 10 Oct 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely. Motivated by applications in computer games, we consider the scenario with discrete-continuous hybrid action space. To handle hybrid action space, previous works either approximate the hybrid space by discretization, or relax it into a continuous set. In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by seamlessly integrating them. Empirical results on a simulation example, scoring a goal in simulated RoboCup soccer and the solo mode in game King of Glory (KOG) validate the efficiency and effectiveness of our method.

Authors (10)

Jiechao Xiong (21 papers)
Qing Wang (341 papers)
Zhuoran Yang (155 papers)
Peng Sun (210 papers)
Lei Han (91 papers)
Yang Zheng (124 papers)
Haobo Fu (14 papers)
Tong Zhang (569 papers)
Ji Liu (285 papers)
Han Liu (340 papers)

Citations (156)

View on Semantic Scholar

Summary

Parametrized Deep Q-Networks in Hybrid Action Spaces

The paper under review presents a novel approach to deep reinforcement learning (DRL) in scenarios where the available action space is a hybrid of discrete and continuous actions, a relatively unexplored area that poses unique challenges demanding sophisticated solutions. The authors introduce Parametrized Deep Q-Networks (P-DQN), a framework that combines the strengths of Deep Q-Networks (DQN) and Deterministic Policy Gradient Networks (DDPG) to effectively handle the nuances of hybrid action environments without resorting to simplifying approximations or relaxations traditionally employed in previous work.

Background and Motivation

In conventional DRL frameworks, the action space is typically either discrete or continuous. Algorithms such as DQN operate efficiently in discrete action spaces, evident in high-profile applications like Atari games, while DDPG and its derivatives find usage in continuous spaces found in robotics control tasks. However, real-world scenarios, such as strategic planning in RTS (Real-Time Strategy) games, require decision-making across both discrete high-level actions and continuous low-level parameter adjustments.

Existing strategies to address hybrid action spaces include discretization of the continuous parameters or relaxation of the discrete elements into continuous domains, both of which can lead to computational inefficiencies or an inflated complexity in the problem space. The P-DQN algorithm circumvents these issues by incorporating parametrization directly into the network architecture, thus preserving the natural structure of the action space while facilitating efficient learning.

Methodological Advances

P-DQN employs a two-part network system: a Q-network that estimates Q-values for state-action pairs, and a policy network that generates parameter values for actions across states. The Q-network is tasked with evaluating discrete actions, while the policy network computes optimal continuous parameters for any given action using deterministic mappings. This dual-structure allows P-DQN to efficiently execute actions that are discretely categorized but continuously parameterized, taking full advantage of both DQN and DDPG techniques.

A key methodological insight is the use of a deterministic policy for parametrization, thereby enabling precise and efficient navigation through the continuous portions of the action space. This results in a more efficient learning process compared to previous frameworks that either approximate or relax hybrid action spaces.

Empirical Evaluation

The empirical assessments conducted by the authors demonstrate the robustness of P-DQN across multiple benchmarks, namely a simulated control task, RoboCup Soccer, and a popular RTS game. A notable performance metric is the reduction in computational demand, as P-DQN learns policies significantly faster and with less computational overhead than frameworks relying on relaxations to continuous sets. In particular, the results indicate improved performance metrics such as goal-scoring efficiency and reduced episode durations.

Discussion and Implications

The advancements provided by P-DQN suggest its potential utility in any domain where discrete decisions must be complemented by fine-grained continuous adjustments. This is especially relevant in modern AI applications involving game playing or robot control, where hybrid action spaces are commonplace. Moreover, the technique's foundation in off-policy learning suggests that it might integrate well with transfer learning or scenarios requiring the incorporation of large-scale prior experience datasets, easing the adaptation to new domains.

Future Prospects

Looking forward, the development of P-DQN opens up multiple avenues for further research, particularly in its application to more complex hybrid scenarios such as multi-agent settings or environments featuring more intricate action hierarchies. Additionally, exploring adaptive mechanisms to fine-tune the division of discreteness and continuity might yield further efficiency gains. Researchers could also consider incorporating aspects of meta-learning to allow P-DQN to dynamically adjust its learning parameters to optimize performance across varying tasks.

In summary, P-DQN establishes a solid methodological base for hybrid action spaces, addressing unique challenges with an integrative and efficient design. Its adoption and further development could significantly enhance the applicability of DRL techniques across diverse and complex real-world scenarios.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos