Accelerating Reinforcement Learning with Learned Skill Priors (2010.11944v1)

Published 22 Oct 2020 in cs.LG, cs.AI, and cs.RO

Abstract: Intelligent agents rely heavily on prior experience when learning a new task, yet most modern reinforcement learning (RL) approaches learn every task from scratch. One approach for leveraging prior knowledge is to transfer skills learned on prior tasks to the new task. However, as the amount of prior experience increases, the number of transferable skills grows too, making it challenging to explore the full set of available skills during downstream learning. Yet, intuitively, not all skills should be explored with equal probability; for example information about the current state can hint which skills are promising to explore. In this work, we propose to implement this intuition by learning a prior over skills. We propose a deep latent variable model that jointly learns an embedding space of skills and the skill prior from offline agent experience. We then extend common maximum-entropy RL approaches to use skill priors to guide downstream learning. We validate our approach, SPiRL (Skill-Prior RL), on complex navigation and robotic manipulation tasks and show that learned skill priors are essential for effective skill transfer from rich datasets. Videos and code are available at https://clvrai.com/spirl.

View on arXiv

Authors (3)

Karl Pertsch (35 papers)
Youngwoon Lee (23 papers)
Joseph J. Lim (36 papers)

Citations (215)

View on Semantic Scholar

Summary

Accelerating Reinforcement Learning with Learned Skill Priors

The paper "Accelerating Reinforcement Learning with Learned Skill Priors" proposes a method to expedite the learning process in reinforcement learning (RL) by transferring previously acquired skills to new tasks. This method, defined as Skill-Prior Reinforcement Learning (SPiRL), utilizes a deep latent variable model to construct an embedding space for skills and to learn a skill prior from offline data. The authors articulate that conventional RL methods generally neglect prior experiences, which necessitates accumulating extensive new data for learning each discrete task.

Methodological Framework

SPiRL addresses two primary components: skill embedding and skill prior, both derived from unstructured agent experiences. The model learns a prior over skills that can inform agents which skills are more promising given the current state. This dynamic targets the crucial balance of breadth versus specificity in skill exploration. The skill repertoire is embedded in a continuous space, allowing for rich behavioral representation while keeping exploration efficient.

The researchers extend common maximum-entropy RL approaches by substituting the entropy term with a computed KL divergence to the learned skill priors. This approach facilitates effective policy learning in a high-dimensional skill space, making better use of available data.

Experimental Validation

The authors conduct experiments in various complex environments, including navigation tasks in mazes and robotic manipulation, both of which are recognized for their challenging, long-horizon characteristics. The results demonstrate that SPiRL effectively leverages skill priors to guide exploration and improve learning efficiency compared to baseline models that do not capitalize on skill transmission.

Noteworthy are the experiments focusing on exploration behaviors. Through visual and statistical analyses, the authors highlight how SPiRL-equipped agents navigate environments more effectively by emphasizing relevant venues in skill exploration. The navigation trials, coupled with robotic manipulation tasks, corroborate the model's capability to succeed in downstream tasks, underscoring its robustness in handling sparse reward signals and achieving stated tasks.

Implications and Future Directions

The insights derived from this research have both practical and theoretical implications for the field of reinforcement learning. On a practical level, SPiRL addresses the inefficiencies of task-state exploration in RL, presenting an avenue for more adaptive and cost-effective RL implementations, especially pertinent in domains like autonomous driving or robotics where task explorations may be resource-intensive or hazardous.

Theoretically, the paper advances the understanding of skill priors in RL, suggesting a structured format for integrating prior experiences into the learning process. The ability to reuse learned skill priors across different tasks postulates a trajectory towards more generalized AI systems, capable of transitioning between tasks with minimal retraining.

The researchers acknowledge the potential for extending SPiRL by integrating methods for learning flexible-length skills, thereby augmenting its adaptability across variable task demands. Furthermore, the notion of deploying learned priors in safety-critical settings induces a promising dialogue on ensuring RL policies remain within safe operational boundaries.

Overall, the paper contributes a forward-thinking approach to RL, offering an empirical and theoretical basis for enhancing learning through the transfer of skill-based knowledge. It opens possibilities for more autonomous and resilient learning systems capable of leveraging past experiences for future actions, thus aligning with broader AI objectives to simulate human-like learning proficiencies.

PDF Markdown

Related Papers

Find Related Papers