Meta Learning Shared Hierarchies (1710.09767v1)

Published 26 Oct 2017 in cs.LG

Abstract: We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.

Authors (5)

Kevin Frans (16 papers)
Jonathan Ho (27 papers)
Xi Chen (1036 papers)
Pieter Abbeel (372 papers)
John Schulman (43 papers)

Citations (339)

View on Semantic Scholar

Summary

The paper demonstrates that MLSH significantly improves learning efficiency by decomposing task policies into reusable sub-policies and task-specific master policies.
The proposed hierarchical framework optimizes policy learning across tasks using standard RL methods integrated in an end-to-end training strategy.
Experimental results across varied benchmarks show that MLSH quickly adapts to novel tasks, reducing exploration time in environments with sparse rewards.

An Overview of Meta Learning Shared Hierarchies

The paper "Meta Learning Shared Hierarchies" addresses the challenge of improving sample efficiency in reinforcement learning (RL) by leveraging prior knowledge across related tasks through hierarchical policies. The authors propose a meta-learning framework called MLSH (Meta Learning Shared Hierarchies), which aims to decompose task policies into reusable, shared sub-policies, or motor primitives, and task-specific master policies.

A salient point of MLSH is its ability to quickly adapt to novel tasks within a distribution by utilizing shared sub-policies that are optimized across tasks. This involves a master policy that switches between these sub-policies, dependent on the task-specific context. This hierarchical approach is akin to the options framework, yet the focus here is on learning shared sub-policies across a distribution of tasks instead of a single-task setting.

Methodology

The key technical contributions of this paper include:

Hierarchical Policy Structure: The paper formalizes the learning of hierarchical policies by defining a set of shared sub-policies and a task-specific master policy. The master policy intermittently chooses among sub-policies, which then interact with the environment over a longer timescale.
Optimization Framework: A unique aspect is the optimization framework developed to determine an effective hierarchy. This optimization seeks to minimize learning time for new tasks by discovering sub-policies that enable faster master policy learning.
End-to-End Training: The authors leverage any standard RL method to facilitate the end-to-end training of task-specific policies using the shared sub-policies. By repeatedly sampling tasks from a distribution, they allow sub-policies to evolve to general task primitives while master policies are trained from scratch per task.

The authors validate MLSH across multiple benchmark environments, ranging from simple 2D tasks to complex 3D robotic control tasks. Noteworthy experimental results demonstrate that MLSH not only discovers meaningful sub-policies but also allows for efficient learning in environments with sparse rewards, a scenario where traditional RL methods often struggle.

Implications

The MLSH framework has practical implications for reinforcement learning in multi-task settings. By effectively leveraging shared hierarchical policies, MLSH enhances sample efficiency and adaptability, critical factors for scaling RL algorithms to real-world applications. It paves the way for future research on task decomposition and transfer learning within RL.

Theoretically, the MLSH model can be understood as a form of curriculum learning within the RL domain, structuring a learning path through skill acquisition (sub-policies) that can be combined to tackle composite tasks, thereby reducing the exploration space. This capability is especially relevant for robotic systems and other domains where task variations are vast, and interactions can be costly.

Future Directions

Potential future work could explore extensions of MLSH where soft policy selection allows continuous parameterization of sub-policies, enabling smoother transitions between skills. Further paper might also focus on how these sub-policies could be evolved or refined autonomously as distributions of tasks vary over time.

Additionally, the model’s capacity for multi-agent scenarios could be investigated. Here, shared hierarchies might facilitate cooperative or competitive strategies among agents within complex environments.

In conclusion, MLSH provides a robust framework that harmonizes meta-learning with hierarchical reinforcement learning. Its promising results across various domains underscore the potential for MLSH to be a foundational approach for developing adaptive AI systems capable of dynamic task resolution. The work sets the stage for further research into scalable RL architectures that effectively manage task diversity and complexity.

PDF Markdown

Related Papers

YouTube

Show All Videos