Unsupervised Perceptual Rewards for Imitation Learning (1612.06699v3)

Published 20 Dec 2016 in cs.CV and cs.RO

Abstract: Reward function design and exploration time are arguably the biggest obstacles to the deployment of reinforcement learning (RL) agents in the real world. In many real-world tasks, designing a reward function takes considerable hand engineering and often requires additional sensors to be installed just to measure whether the task has been executed successfully. Furthermore, many interesting tasks consist of multiple implicit intermediate steps that must be executed in sequence. Even when the final outcome can be measured, it does not necessarily provide feedback on these intermediate steps. To address these issues, we propose leveraging the abstraction power of intermediate visual representations learned by deep models to quickly infer perceptual reward functions from small numbers of demonstrations. We present a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps. This method makes use of the features in a pre-trained deep model, but does not require any explicit specification of sub-goals. The resulting reward functions can then be used by an RL agent to learn to perform the task in real-world settings. To evaluate the learned reward, we present qualitative results on two real-world tasks and a quantitative evaluation against a human-designed reward function. We also show that our method can be used to learn a real-world door opening skill using a real robot, even when the demonstration used for reward learning is provided by a human using their own hand. To our knowledge, these are the first results showing that complex robotic manipulation skills can be learned directly and without supervised labels from a video of a human performing the task. Supplementary material and data are available at https://sermanet.github.io/rewards

PDF Abstract

Insightful Overview of Unsupervised Perceptual Rewards for Imitation Learning

The paper "Unsupervised Perceptual Rewards for Imitation Learning" addresses core challenges in deploying reinforcement learning (RL) agents in real-world environments, specifically the high costs associated with designing reward functions and requiring extensive demonstration data. The authors propose an innovative method that leverages the abstraction power of intermediate visual representations in deep models to overcome these challenges. This method efficiently deduces perceptual reward functions from a minimal set of demonstrations, enhancing the ability of RL agents to perform complex real-world tasks without necessitating explicit sub-goal specifications.

Methodology and Contributions

The central contribution of this paper is a novel technique for automatically identifying key intermediate steps of a task using only a limited number of demonstration sequences. This is achieved by utilizing the features derived from a pre-trained deep model to discern the most discriminative features for these steps. Consequently, the method produces dense and smooth reward functions that significantly enhance RL agent learning.

A key insight from the proposed method is the extraction of meaningful sub-goals from video demonstrations without requiring any additional sensory inputs. This is accomplished through an approximation method inspired by MaxEnt inverse reinforcement learning, but instead of approximating the posterior distribution over trajectories, the authors assume independence between time steps and features, which leads to a simpler and computationally efficient learning rule.

The empirical results displayed notable performance in both qualitative and quantitative evaluations. The learned reward functions achieved reliable performances across different scenarios. For example, the complex task of a robot learning to open a door, while the demonstration was executed by a human hand, displayed substantial success. Importantly, these results were achieved without the use of any supervised labels, highlighting the method's contribution to the field of vision-based reward learning.

Implications and Future Directions

Theoretically, this paper challenges the conventional reliance on extensive sensory feedback and numerous demonstrations for imitation learning. The reliance on pre-trained deep models presents a versatile and adaptive approach, allowing reward functions to be learned in diverse and dynamic environments.

Practically, this capability opens avenues for deploying RL in scenarios where sensory feedback installation is impractical or impossible, such as in remote environments or on mobile robots. The ability to derive reward functions from video footage presents a scalable solution for tasks requiring manipulation skills in complex settings, thereby broadening the horizon of RL applications in real-world tasks.

Future research could explore extending the methods to handle variations in viewpoint and context, leveraging the model's principles for lifelong learning, and potentially incorporating more sophisticated model architectures to further advance RL efficiency and applicability. Expanding the unsupervised learning capacity to recognize a broader array of sub-goals could also enhance adaptability to unseen tasks.

In summary, this paper presents a robust framework that efficiently derives perceptual rewards from limited demonstration data without additional sensory inputs, thereby reducing the barriers to using RL in practical applications. The approach could serve as a cornerstone for future developments in imitation learning and potentially transform how complex real-world reinforcement learning problems are approached.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Pierre Sermanet (37 papers)
Kelvin Xu (25 papers)
Sergey Levine (531 papers)

Citations (152)

View on Semantic Scholar

Unsupervised Perceptual Rewards for Imitation Learning (1612.06699v3)

Insightful Overview of Unsupervised Perceptual Rewards for Imitation Learning

Methodology and Contributions

Implications and Future Directions

Related Papers

GitHub

YouTube