Scaling data-driven robotics with reward sketching and batch reinforcement learning (1909.12200v3)

Published 26 Sep 2019 in cs.RO and cs.LG

Abstract: We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.

Authors (16)

Serkan Cabi (15 papers)
Alexander Novikov (30 papers)
Ksenia Konyushkova (16 papers)
Scott Reed (32 papers)
Rae Jeong (9 papers)
Konrad Zolna (24 papers)
Yusuf Aytar (36 papers)
David Budden (29 papers)
Mel Vecerik (14 papers)
Oleg Sushkov (15 papers)
David Barker (16 papers)
Jonathan Scholz (7 papers)
Misha Denil (36 papers)
Nando de Freitas (98 papers)
Ziyu Wang (137 papers)
Sergio Gómez Colmenarejo (11 papers)

Citations (28)

View on Semantic Scholar

Summary

Scaling Data-Driven Robotics with Reward Sketching and Batch Reinforcement Learning

The paper at hand presents an innovative method in the domain of robotics, focusing on the application of deep reinforcement learning (RL) to develop control policies for diverse manipulation tasks. This work primarily introduces a methodology called reward sketching and leverages batch reinforcement learning to advance the scalability and efficiency of training robotic systems.

Core Contributions

The paper makes several key contributions to the field:

Reward Sketching: The authors introduce reward sketching, a technique that quickly engages human users to provide intuitive annotations for robotic tasks. This helps in learning reward functions for new tasks by utilizing human preferences. These functions annotate historical data retrospectively, thus generating large labeled datasets without the need for continuous interaction with robots during learning.
Batch Reinforcement Learning: The approach employs batch RL to learn manipulation policies using visual inputs offline, which removes the necessity for real-time data collection through robotic interaction. This facilitates scaling up RL applications in robotics, significantly mitigating issues like wear and tear associated with running physical robots.
Data-Driven Robotics: By recording robot experiences continuously, the authors amassed a comprehensive dataset comprising multi-camera footage and sensory data. This database, referred to as NeverEnding Storage, allows for the persistent and extended recording of robotic experiences.

Methodology

The paper outlines a systematic workflow involving several phases:

Data Collection: Robot experiences are captured regardless of task quality, with experiences sourced from human teleoperators, scripted policies, and trained agents.
Reward Learning: A key focus is on training reward models through human-annotated reward sketches, which facilitates the automatic labeling of the large dataset stored within NeverEnding Storage.
Policy Training: Policies are learned using batch RL methods such as distributional RL from the large, pre-annotated datasets without further robot executions, allowing experimentation with different RL algorithms during the training phase.

Results and Implications

The experimental results show that batch RL techniques can train policies robust enough to perform complex tasks like object manipulation and insertion. Notably, these trained agents demonstrate robust performance, often surpassing the capabilities of human teleoperators. This indicates a promising direction for deploying RL-trained agents across diverse and practical robotic applications without the restrictions of direct interaction or hand-crafted reward functions for each specific task.

The paper anticipates several further developments:

Scalability: The approach opens avenues for scaling robotic learning systems across various domains by leveraging accumulated data and reward sketching without the need for extensive live interaction.
Generalization and Robustness: The system’s ability to generalize to unseen environments and conditions further affirms the potential of such approaches in achieving versatile robotic solutions.

In conclusion, this work contributes significantly to the data-driven robotics domain by efficiently leveraging human expertise and batch reinforcement learning techniques. The integration of reward sketching and batch RL reduces limitations previously hindering the large-scale application of RL in robotics, thus laying the groundwork for future enhancements and applications in artificial intelligence and autonomous systems.

PDF Markdown

Related Papers

YouTube

Show All Videos