Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning (1710.06117v2)

Published 17 Oct 2017 in cs.RO, cs.AI, and cs.LG

Abstract: In order for robots to perform mission-critical tasks, it is essential that they are able to quickly adapt to changes in their environment as well as to injuries and or other bodily changes. Deep reinforcement learning has been shown to be successful in training robot control policies for operation in complex environments. However, existing methods typically employ only a single policy. This can limit the adaptability since a large environmental modification might require a completely different behavior compared to the learning environment. To solve this problem, we propose Map-based Multi-Policy Reinforcement Learning (MMPRL), which aims to search and store multiple policies that encode different behavioral features while maximizing the expected reward in advance of the environment change. Thanks to these policies, which are stored into a multi-dimensional discrete map according to its behavioral feature, adaptation can be performed within reasonable time without retraining the robot. An appropriate pre-trained policy from the map can be recalled using Bayesian optimization. Our experiments show that MMPRL enables robots to quickly adapt to large changes without requiring any prior knowledge on the type of injuries that could occur. A highlight of the learned behaviors can be found here: https://youtu.be/QwInbilXNOE .

Authors (5)

Ayaka Kume (2 papers)
Eiichi Matsumoto (7 papers)
Kuniyuki Takahashi (17 papers)
Wilson Ko (4 papers)
Jethro Tan (6 papers)

Citations (11)

View on Semantic Scholar

Summary

The paper introduces MMPRL, a framework that builds and stores multiple distinct policies to boost robot adaptability in varying scenarios.
It replaces random mutations with DDPG-driven policy exploration, enhancing the diversity and efficiency of the behavior-performance map.
Experimental results with hexapod and Walker2D models demonstrate rapid adaptation to environmental changes without the need for retraining.

Insights on Map-based Multi-Policy Reinforcement Learning: Enhancing Robot Adaptability

The paper presents a novel reinforcement learning approach, Map-based Multi-Policy Reinforcement Learning (MMPRL), to enhance the adaptability of robots in dynamic and unpredictable environments. Traditional deep reinforcement learning (DRL) approaches often rely on a single policy, which may not adapt well to significant environmental changes or robot damage. MMPRL addresses these limitations by generating and storing multiple policies, each with distinct behavioral features, in a multi-dimensional discrete map. This repository of diverse behaviors allows robots to swiftly adapt to changes by selecting the most suitable pre-trained policy using Bayesian optimization.

Key Contributions and Methodology

The MMPRL method distinguishes itself by combining DRL with the concept of a behavior-performance map, originally proposed in the intelligent trial-and-error (IT&E) algorithm. The critical innovation in MMPRL is the use of DRL, specifically Deep Deterministic Policy Gradient (DDPG), to replace the random mutation phase in the map creation process of IT&E. This replacement enhances the exploration of high-dimensional policy spaces, affording a more robust repertoire of policies.

The methodological framework of MMPRL includes:

Map Creation Phase: This phase involves training numerous policies using DDPG, storing them into a grid-like structure based on their behavioral descriptors, and recording the associated performance metrics. This process precedes real-world deployment, allowing for extensive policy training without the constraints of real-time operation.
Adaptation Phase: Upon encountering an environmental change or damage, the robot uses Bayesian optimization to swiftly search the map for a policy that maximizes performance in the new context. This phase is designed to operate efficiently under real-time constraints, leveraging pre-trained policies to minimize downtime.

Experimental Evaluation

The effectiveness of MMPRL is validated through simulations involving two robot models: a hexapod and a Walker2D, in the MuJoCo physics environment integrated with OpenAI Gym. The experiments demonstrate the method’s ability to adapt to complex changes, such as limb injuries, delayed sensory feedback, and variations in terrain. Notably, MMPRL's adaptability often surpasses that of a single-policy DDPG approach. This is evidenced by the rapid adjustment to new scenarios without retraining, showcasing the advantage of having a diverse set of pre-trained policies.

Implications and Future Work

The findings have notable implications for the deployment of robots in mission-critical tasks where environmental conditions may be volatile, such as search-and-rescue operations. The enhancement of adaptability without the need for exhaustive retraining mitigates operational risks associated with robotic failures in unforeseen situations.

Future research directions may explore the optimization of the map creation process, potentially through the incorporation of curiosity-driven exploration strategies to further expedite the discovery of diverse behaviors. Additionally, the applicability of MMPRL could be extended to physical robots and humanoid systems, promoting broader adoption in real-world scenarios.

The paper advances the field of adaptive robotics by offering a robust framework that balances the exploration of behavioral diversity and the practical need for rapid adaptation, thereby overcoming a significant limitation in current DRL methodologies.

PDF Markdown

Related Papers

YouTube

Show All Videos