Modular Multitask Reinforcement Learning with Policy Sketches (1611.01796v2)

Published 6 Nov 2016 in cs.LG and cs.NE

Abstract: We describe a framework for multitask deep reinforcement learning guided by policy sketches. Sketches annotate tasks with sequences of named subtasks, providing information about high-level structural relationships among tasks but not how to implement them---specifically not providing the detailed guidance used by much previous work on learning policy abstractions for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic motivations). To learn from sketches, we present a model that associates every subtask with a modular subpolicy, and jointly maximizes reward over full task-specific policies by tying parameters across shared subpolicies. Optimization is accomplished via a decoupled actor--critic training objective that facilitates learning common behaviors from multiple dissimilar reward functions. We evaluate the effectiveness of our approach in three environments featuring both discrete and continuous control, and with sparse rewards that can be obtained only after completing a number of high-level subgoals. Experiments show that using our approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

Authors (3)

Jacob Andreas (116 papers)
Dan Klein (99 papers)
Sergey Levine (531 papers)

Citations (447)

View on Semantic Scholar

Summary

Modular Multitask Reinforcement Learning with Policy Sketches

The paper presents a novel approach to multitask deep reinforcement learning (RL), emphasizing the use of policy sketches to facilitate learning across various tasks. Policy sketches provide symbolic representations of tasks, delineating sequences of subtasks without explicitly guiding their implementation. This abstraction allows integrating high-level task structure without detailed environmental grounding such as intermediate rewards or specific subtask completion signals.

Methodology

The proposed framework associates each subtask with a modular subpolicy. By tying these subpolicies across tasks, the model jointly maximizes task-specific policies' rewards using a decoupled actor-critic optimization approach. This technique allows the learning of shared behaviors, which is especially beneficial when dealing with sparse reward environments where successful task completion requires fulfilling multiple high-level subgoals.

Empirical Evaluation

The framework was assessed in three distinct environments with both discrete and continuous control challenges. These environments included a crafting-based 2-D game, maze navigation, and a 3-D robot locomotion task. Notably, the rewards in these environments were sparse, highlighting the need for effective hierarchical policy learning.

The experimental results demonstrated that the proposed method outperformed contemporary techniques for learning both task-specific and shared policies. The system effectively induced a library of interpretable policy fragments, facilitating rapid adaptation to new tasks through recombination of these learned primitives.

Implications

The approach significantly reduces the need for fine-grained supervision traditionally required in hierarchical RL. Policy sketches suffice for obtaining the benefits of hierarchical policy learning, offering substantial performance improvements over fully unsupervised methods. By learning modular policies, the framework provides a robust scaffold for task adaptation and transfer, with potential zero-shot generalization capabilities.

Future Directions

Potential future work may involve integrating this approach with language-based task definitions, enabling policy learning directly from natural language instructions. Additionally, extending this modular policy learning framework to more complex, real-world scenarios could further elucidate its practical benefits.

Contributions and Related Work

The paper builds upon previous work in hierarchical RL and modular network compositions. Notably, it extends the ideas of hierarchical abstract machines (HAM) and similar frameworks by effectively coupling high-level symbolic task descriptions to subpolicies without explicit state grounding. This extension reflects advances in learning compositional deep architectures in interactive environments under sparse reward conditions.

Overall, this work offers a compelling paradigm for multitask RL, leveraging the abstraction of policy sketches to achieve efficient and interpretable policy learning across diverse tasks.

Related Papers

YouTube

Show All Videos