Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning (1910.10897v2)

Published 24 Oct 2019 in cs.LG, cs.AI, cs.RO, and stat.ML

Abstract: Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is to enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 7 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods.

PDF Abstract

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta-Reinforcement Learning

The paper entitled "Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning" by Tianhe Yu et al. presents an open-source simulated benchmark designed to evaluate the capabilities of meta-reinforcement learning (meta-RL) and multi-task reinforcement learning (multi-task RL) algorithms. This benchmark, named Meta-World, consists of 50 distinct robotic manipulation tasks, providing a substantially broader task distribution than those typically used in meta-RL research.

Objectives and Contributions

The primary objective of the paper is to create a benchmark that enables the evaluation of algorithms capable of generalizing across multiple, diverse tasks. Many existing meta-RL benchmarks focus on narrow task distributions, such as variations in running velocities for a simulated robot. These narrow benchmarks do not adequately test an algorithm's ability to generalize to entirely new tasks. Meta-World addresses this limitation by including a wide range of tasks involving different robotic manipulation scenarios, thereby enhancing the scope and challenge of the evaluations.

Key contributions of the paper include:

A Suite of 50 Tasks: The benchmark provides a comprehensive suite of tasks involving a robotic arm, designed to simulate various manipulation tasks such as reaching, grasping, placing, and pushing objects.
Evaluation of Algorithms: The authors evaluate seven state-of-the-art meta-RL and multi-task RL algorithms on these tasks, providing insights into their performance and limitations.
Open-Source Code: The benchmark and related code are made available to the research community, enabling reproducibility and further development.

Task Design and Variability

The tasks in Meta-World are designed to exhibit both parametric and non-parametric variability. While each task represents a distinct manipulation challenge, parametric variations such as different object and goal positions within tasks ensure a broader coverage of potential scenarios. This variability prevents the overfitting of models to specific tasks and encourages the development of algorithms that can generalize across diverse task sets.

The tasks require the robotic arm to perform actions in a shared tabletop environment, promoting the utilization of common structures across the tasks. For instance, tasks like opening a door and pulling a drawer share underlying principles of manipulation but differ in the specifics of object interaction. Such design nuances foster a balance between shared structure and task diversity.

Evaluation Protocol and Metrics

The paper introduces an evaluation protocol with varying levels of difficulty:

Meta-Learning 1 (ML1): Adapting to goal variation within one task.
Multi-Task 1 (MT1): Learning a single policy that generalizes to 50 tasks within the same environment.
Multi-Task 10, Multi-Task 50 (MT10, MT50): Learning policies that generalize to 10 and 50 distinct environments, respectively.
Meta-Learning 10, Meta-Learning 45 (ML10, ML45): Few-shot adaptation to new tasks with training on 10 and 45 tasks, respectively.

The success of policies is evaluated based on whether they achieve goal states within specified thresholds, with success metrics defined for each task.

Experimental Results and Analysis

The experiments reveal that while single-task RL algorithms can solve individual tasks when provided sufficient data, multi-task RL and meta-RL algorithms face significant challenges in generalizing across diverse tasks. For instance, multi-task SAC (Soft Actor-Critic) achieves a 68% success rate on MT10 tasks but only 38.5% on the more challenging MT50 tasks. Similarly, meta-RL methods like MAML and RL $^2$ exhibit substantial room for improvement, particularly in their ability to generalize to new meta-test tasks.

Implications and Future Directions

The findings suggest that current state-of-the-art algorithms exhibit limited generalization capabilities when faced with a broad and diverse set of tasks. This observation underscores the need for further algorithmic development in both multi-task RL and meta-RL. Potential directions for future research include:

Algorithmic Enhancements: Developing methods that better leverage shared structures across tasks to improve generalization.
Benchmark Extensions: Including more complex, long-horizon tasks and addressing real-world constraints such as image-based observations and sparse rewards.
Training Efficiency: Enhancing the sample efficiency of algorithms to reduce the data requirements for effective learning.

Conclusion

The Meta-World benchmark represents a significant advancement in the evaluation of multi-task and meta-RL algorithms, providing a challenging and comprehensive suite of tasks. The results from initial evaluations indicate substantial room for improvement, paving the way for future research efforts aimed at achieving robust generalization across diverse robotic manipulation tasks. The open-source nature of the benchmark ensures its utility as a tool for the broader research community to drive forward advancements in reinforcement learning.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Tianhe Yu (36 papers)
Deirdre Quillen (5 papers)
Zhanpeng He (15 papers)
Ryan Julian (16 papers)
Avnish Narayan (4 papers)
Hayden Shively (1 paper)
Adithya Bellathur (1 paper)
Karol Hausman (56 papers)
Chelsea Finn (264 papers)
Sergey Levine (531 papers)

Citations (984)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos