Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning (2204.02863v2)

Published 6 Apr 2022 in cs.RO, cs.AI, and cs.CV

Abstract: We present DOME, a novel method for one-shot imitation learning, where a task can be learned from just a single demonstration and then be deployed immediately, without any further data collection or training. DOME does not require prior task or object knowledge, and can perform the task in novel object configurations and with distractors. At its core, DOME uses an image-conditioned object segmentation network followed by a learned visual servoing network, to move the robot's end-effector to the same relative pose to the object as during the demonstration, after which the task can be completed by replaying the demonstration's end-effector velocities. We show that DOME achieves near 100% success rate on 7 real-world everyday tasks, and we perform several studies to thoroughly understand each individual component of DOME. Videos and supplementary material are available at: https://www.robot-learning.uk/dome .

Citations (37)

View on Semantic Scholar

Summary

The paper introduces DOME, which enables immediate robotic manipulation after a single demonstration using image-conditioned visual servoing.
It leverages simulation-trained segmentation and servoing networks with domain randomization to robustly isolate target objects despite distractors.
The method achieves near-perfect success rates across seven manipulation tasks, outperforming baselines like Behavioral Cloning and Residual RL.

One-Shot Imitation Learning for Visual Servoing: An Analysis of DOME

This paper introduces "Demonstrate Once, Imitate Immediately" (DOME), a novel approach to one-shot imitation learning in robotic manipulation tasks. DOME represents a significant advancement in the field of imitation learning by facilitating immediate task execution following a singular demonstration without necessitating additional data collection or training. This method emphasizes efficiency in terms of data usage and task execution, contrasting traditional imitation learning techniques that often require extensive datasets and multiple demonstrations to achieve competent task performance.

Core Methodology

DOME's methodology centers around leveraging visual servoing, conditioned on images for precise manipulation tasks. The system operates by first capturing an image of the target object at the bottleneck pose during the demonstration. The End-Effector (EE) of the robot is aligned using a learned visual servoing network to match this bottleneck image during deployment, achieving a similar relative positioning to the object as observed during the demonstration. Following this alignment, the task is completed by replicating the velocities observed in the demonstration. This architecture eschews the need for prior knowledge of object specifics or additional environmental details, which underlies its efficiency in various configurations and environmental distractions.

Experimental Validation and Results

DOME was validated across seven real-world manipulation tasks, including object lifting, peg-in-hole interactions, and container lid opening, manifesting near-perfect success rates even amidst distractor objects. This success rate underscores the robustness of the image-conditioned segmentation network, pivotal in isolating the object of interest from distracting elements within the deployment environment. Moreover, DOME's performance holds up against baselines involving Behavioral Cloning (BC) and Residual Reinforcement Learning (RRL), which failed to accomplish task objectives either due to insufficient data (BC) or ineffective exploration strategies (RRL).

Learned Components and Simulation Training

The paper details the training of DOME's segmentation and servoing networks entirely in simulation using diverse datasets. This training incorporates domain randomization, which mitigates the sim-to-real reality gap. It includes variations in lighting, object textures, and distractor configurations to ensure robust real-world performance. The segmentation network uses a FiLM-based architecture optimized for image-conditioned tasks, demonstrating superior performance over other architectures in ablation studies.

Implications and Future Directions

DOME's frame of immediate deployment post-demonstration sets new precedents for practical applications in robotics, especially in dynamic and unfamiliar environments. This could significantly alter the operational paradigms within industries that rely on flexible automation solutions, such as manufacturing and personal robotics. Future research could focus on scaling the system to interface with more complex object interactions and diversifying the types of manipulable objects between demonstration and execution. Additionally, automating the determination of the bottleneck pose to accommodate non-expert users could enhance the accessibility and practicality of DOME in everyday settings. Addressing these areas could position DOME as a cornerstone framework in evolving robotic manipulation technologies.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Shreyas__Dixit/status/1901635320933966127

YouTube

Show All Videos