Reinforcement Learning with Videos: Combining Offline Observations with Interaction (2011.06507v2)

Published 12 Nov 2020 in cs.LG, cs.AI, cs.CV, and cs.RO

Abstract: Reinforcement learning is a powerful framework for robots to acquire skills from experience, but often requires a substantial amount of online data collection. As a result, it is difficult to collect sufficiently diverse experiences that are needed for robots to generalize broadly. Videos of humans, on the other hand, are a readily available source of broad and interesting experiences. In this paper, we consider the question: can we perform reinforcement learning directly on experience collected by humans? This problem is particularly difficult, as such videos are not annotated with actions and exhibit substantial visual domain shift relative to the robot's embodiment. To address these challenges, we propose a framework for reinforcement learning with videos (RLV). RLV learns a policy and value function using experience collected by humans in combination with data collected by robots. In our experiments, we find that RLV is able to leverage such videos to learn challenging vision-based skills with less than half as many samples as RL methods that learn from scratch.

Authors (5)

Karl Schmeckpeper (19 papers)
Oleh Rybkin (18 papers)
Kostas Daniilidis (119 papers)
Sergey Levine (531 papers)
Chelsea Finn (264 papers)

Citations (98)

View on Semantic Scholar

Summary

Reinforcement Learning with Videos: Combining Offline Observations with Interaction

The paper "Reinforcement Learning with Videos: Combining Offline Observations with Interaction" explores a novel approach to enhancing data efficiency in reinforcement learning (RL) for robotic tasks. The authors propose the Reinforcement Learning with Videos (RLV) framework to leverage observational data, specifically videos of humans, as an additional source of information in conjunction with traditional robotic interaction data. This approach addresses the inherent challenges posed by human video data, such as the lack of annotated actions and rewards, and significant visual domain shifts due to differences in embodiment between humans and robots.

RLV is developed to integrate human observational data within the RL paradigm by employing several key strategies: inferring missing actions, estimating rewards, and mitigating domain shift. The procedure involves maintaining distinct action-free and action-conditioned replay buffers, allowing the robots to update their policies using both human and robotic datasets. Action inference is achieved through training an inverse dynamics model on the robot's action-conditioned data, which is then applied to human observational data. Rewards for human videos are heuristically determined based on terminal states, aligning reinforcement learning objectives with successful human demonstrations. Furthermore, adversarial domain adaptation techniques are applied to embed both human and robot observations into a domain-invariant feature space.

Empirical evaluations demonstrate that RLV markedly improves the data efficiency of RL. Notably, in various experimental scenarios—including state-based control tasks like Acrobot and more complex vision-based robotic manipulation tasks—the RLV approach consistently reduced the number of required samples to achieve optimal policies compared to baselines and previous imitation learning methods such as ILPO and BCO. RLV also showed robustness to sub-optimal observational data, indicating its ability to improve beyond the quality of human demonstrations.

The experimentation extended to environments with substantial visual domain shifts, showcasing RLV's adaptability to real-world human video data. The framework was capable of effectively handling tasks that included considerable discrepancies in morphology, background, and viewpoint between human and robotic observations. By mitigating such domain shifts, RLV demonstrated significant potential for practical applications in leveraging vast amounts of readily available human video datasets.

The implications of this research are considerable: RLV may facilitate broader utilization of human video data in robotic reinforcement learning, significantly decreasing training time and resource expenses. The methodology is adaptable and can reduce reliance on extensive robotic interaction data, which is often difficult and costly to obtain. Theoretically, this framework extends the capability of RL to operate in settings with sparse rewards and challenging vision-based tasks. Future work should explore the refinement of domain adaptation techniques and expand the framework's application to a wider range of complex real-world environments.

In summary, the RLV framework constitutes a promising advancement in the field of reinforcement learning for robotic applications, by effectively integrating human observational data to enhance learning efficiency and capabilities.

PDF Markdown

Related Papers

YouTube

Show All Videos