Action Branching Architectures for Deep Reinforcement Learning (1711.08946v2)

Published 24 Nov 2017 in cs.LG and cs.AI

Abstract: Discrete-action algorithms have been central to numerous recent successes of deep reinforcement learning. However, applying these algorithms to high-dimensional action tasks requires tackling the combinatorial increase of the number of possible actions with the number of action dimensions. This problem is further exacerbated for continuous-action tasks that require fine control of actions via discretization. In this paper, we propose a novel neural architecture featuring a shared decision module followed by several network branches, one for each action dimension. This approach achieves a linear increase of the number of network outputs with the number of degrees of freedom by allowing a level of independence for each individual action dimension. To illustrate the approach, we present a novel agent, called Branching Dueling Q-Network (BDQ), as a branching variant of the Dueling Double Deep Q-Network (Dueling DDQN). We evaluate the performance of our agent on a set of challenging continuous control tasks. The empirical results show that the proposed agent scales gracefully to environments with increasing action dimensionality and indicate the significance of the shared decision module in coordination of the distributed action branches. Furthermore, we show that the proposed agent performs competitively against a state-of-the-art continuous control algorithm, Deep Deterministic Policy Gradient (DDPG).

Authors (3)

Arash Tavakoli (22 papers)
Fabio Pardo (11 papers)
Petar Kormushev (33 papers)

Citations (236)

View on Semantic Scholar

Summary

Action Branching Architectures for Deep Reinforcement Learning

The paper "Action Branching Architectures for Deep Reinforcement Learning" by Arash Tavakoli, Fabio Pardo, and Petar Kormushev introduces a novel approach to tackle one of the standing challenges in deep reinforcement learning: efficiently handling high-dimensional action spaces. This issue arises predominantly when extending discrete-action algorithms to environments with either high-dimensional discrete actions or continuous actions, where naive discretization leads to a combinatorial explosion in the number of possible actions.

Overview

The paper proposes a unique neural network architecture, termed action branching, wherein a shared decision module is followed by multiple branches, one for each action dimension. This architecture aims to mitigate the exponential growth of possible actions by ensuring that the number of network outputs increases linearly with the degrees of freedom in the action space. The approach allows each action dimension to operate with a degree of independence using separate network branches, thus facilitating a more scalable and efficient framework in partially autonomous decision-making.

A notable instantiation of this architecture is the Branching Dueling Q-Network (BDQ), which is an extension of the Dueling Double Deep Q-Network (Dueling DDQN). The BDQ model's efficiency and scalability are empirically validated through experiments on various continuous control tasks. The results underscore the BDQ agent's capability to perform competitively against the Deep Deterministic Policy Gradient (DDPG), specifically showcasing its superior performance in environments with elevated action-space dimensionality.

Methodology

The paper details how the action branching architecture elegantly incorporates several enhancements from deep Q-learning, including double Q-learning, prioritized experience replay, and the dueling network architecture. The BDQ adjusts the dueling network by having a common state-value estimator while distributing the state-dependent action advantages across independent branches. The prioritization and rescaling techniques, alongside the specific methods for learning target aggregation and loss computation, are meticulously crafted to converge the model towards stable and optimal policy learning.

Results and Implications

The empirical analysis reveals that the BDQ outperforms the DDPG, especially in high-dimensional settings such as those involving complex locomotion tasks (e.g., the Humanoid-v1 environment with 17 degrees of freedoms). The branching structure, by leveraging the shared module for implicit coordination, achieves what had been previously deemed intractable for discrete-action algorithms.

The findings of this research carry significant theoretical and practical implications. Theoretically, it suggests a new perspective on decomposing large action spaces into coordinated sub-actions without succumbing to an overwhelming increase in computational demand. Practically, this could herald enhanced applicability of discrete-action reinforcement learning algorithms in robotics and other domains requiring sophisticated, high-dimensional control strategies.

Future Directions

While the action branching approach has been demonstrated primarily with Q-learning-based algorithms, future exploration could involve integrating other reinforcement learning models, potentially compounding improvements in sample efficiency and policy learning. Further research could also probe the limits of branching architectures, investigating their adaptability and performance across varying tasks beyond continuous control domains.

In summation, the action branching framework offers a substantial contribution to reinforcement learning by allowing existing discrete-action methodologies to seamlessly extend into domains with large, multidimensional action spaces while maintaining computational tractability. The paper paves the way for innovations aimed at scalable and efficient reinforcement learning solutions across increasingly complex environments.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos