Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration (1707.02920v2)

Published 10 Jul 2017 in cs.LG, cs.AI, and cs.RO

Abstract: We propose a technique for multi-task learning from demonstration that trains the controller of a low-cost robotic arm to accomplish several complex picking and placing tasks, as well as non-prehensile manipulation. The controller is a recurrent neural network using raw images as input and generating robot arm trajectories, with the parameters shared across the tasks. The controller also combines VAE-GAN-based reconstruction with autoregressive multimodal action prediction. Our results demonstrate that it is possible to learn complex manipulation tasks, such as picking up a towel, wiping an object, and depositing the towel to its previous position, entirely from raw images with direct behavior cloning. We show that weight sharing and reconstruction-based regularization substantially improve generalization and robustness, and training on multiple tasks simultaneously increases the success rate on all tasks.

Citations (248)

View on Semantic Scholar

Summary

The paper demonstrates an innovative end-to-end approach that uses a recurrent neural network with VAE-GAN regularization for multi-task robotic manipulation.
The study shows that joint multi-task learning with shared parameters significantly improves success rates, achieving up to 88% on certain real-world tasks.
The method leverages behavior cloning from human demonstrations to enable reliable, vision-based task execution without the overhead of reinforcement learning.

Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration

The paper presented by Rahmatizadeh, Abolghasemi, Boloni, and Levine offers an innovative method for enabling inexpensive robotic systems to learn and perform multiple manipulation tasks using a vision-based, end-to-end learning strategy. The approach centers on the training of a recurrent neural network capable of processing raw environmental images and employing a task selector to predict the robot's joint configurations for subsequent actions.

The key contribution of this research is the demonstration of effective task execution by a low-cost robot, which traditionally lacks proprioceptive capabilities, meaning it must rely solely on visual input. The controller integrates a VAE-GAN-based prediction model which enhances generalization through weight sharing and reconstruction-based regularization. This methodology not only improves the learning efficiency across diverse tasks, such as picking up a towel, wiping surfaces, and correctly depositing items, but does so using direct behavior cloning from human demonstration, thus bypassing the computational and temporal demands of reinforcement learning.

The work specifically emphasizes the multi-task training of a neural network, achieving positive transfer across tasks by sharing parameters using a multi-task learning framework. The investigation reveals that jointly training on varied tasks, including rigid, jointed, and deformable object manipulation, enhances performance metrics such as success rates over training distinct models for each task. The controller generates action trajectories based on the visual input, showcasing the ability to generalize from the trained data to produce reliable task executions.

Experimentally, the research focuses on tasks relevant to Activities of Daily Living (ADLs), targeting applicability in assistive robotics. The tasks, executed by an inexpensive Lynxmotion AL5D robotic arm, include picking and placing objects, pushing plates, and manipulating deformable items, reflecting real-world scenarios pertinent for disabled or elderly assistance. Notably, the results indicate high success rates, with the multi-tasking network achieving up to 88% success on certain tasks, evidencing the system's reliability.

From a practical standpoint, this research implies a significant advancement in making sophisticated robotic manipulation feasible at lower costs, leveraging visual inputs from ubiquitous devices like cameras. Theoretically, it proposes an effective model for integrating VAE-GAN with recurrent architectures for robot learning applications, enriching the discourse on vision-based robotics. Future work could explore the integration of dataset aggregation or reinforcement learning strategies to potentially refine task performance further. Moreover, expanding the capability to more complex task environments could further push the boundaries of affordable robotic autonomy. The approach outlined in this research could indeed inspire future developments in general-purpose robotic assistants, democratizing access to automation solutions.

PDF Markdown

Related Papers

YouTube

Show All Videos