You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration (2201.12716v2)

Published 30 Jan 2022 in cs.RO, cs.AI, cs.CV, cs.SY, and eess.SY

Abstract: Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into longrange, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in highprecision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations. The supplementary video is available at https://www.youtube.com/watch?v=WAr8ZY3mYyw

Authors (4)

Bowen Wen (33 papers)
Wenzhao Lian (14 papers)
Kostas Bekris (36 papers)
Stefan Schaal (73 papers)

Citations (76)

View on Semantic Scholar

Summary

The paper introduces a framework leveraging a novel category-level object representation via NUNOCS for robust 6D pose estimation and skill transfer.
It employs model-free motion tracking and a local attention-based behavior cloning (CatBC) to achieve precise, closed-loop control during dynamic manipulation tasks.
Simulation-based synthetic training with domain randomization enables rapid adaptation across diverse objects, excelling in tasks like gear insertion and battery assembly.

Analyzing "You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration"

The paper "You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration" introduces a novel manipulation framework that leverages visual feedback to teach robots complex manipulation tasks with just a single demonstration. This framework circumvents several challenges traditionally associated with category-level manipulation such as extensive training data collection, manual specification of keypoints, and accommodating dynamics uncertainties during manipulation. It addresses the need for generalization across object instances within the same category using synthetic training and a robust visual representation technique, Non-Uniform Normalized Object Coordinate Space (NUNOCS).

Key Contributions and Techniques

Category-Level Representation and NUNOCS: The framework employs a category-level object representation that utilizes NUNOCS for establishing dense point correspondences across different object instances. This approach enables robust 6D pose estimation and facilitates manipulation skill transfer across category-level objects without requiring their CAD models.
Visual Feedback through Model-Free Motion Tracking: The methodology involves a model-free 6 DoF object motion tracker called BundleTrack to parse a single demonstration video. This tracking provides essential visual feedback and aids in closed-loop control, ensuring the object remains on the desired trajectory amidst dynamic uncertainties during manipulation.
Behavior Cloning with Local Attention Mechanism: The paper introduces Category-level Behavior Cloning (CatBC), which relies on a computed trajectory from a demonstration video. A local attention mechanism dynamically anchors task-relevant frames for precision in manipulation, accommodating variation in object scaling and typology.
Simulation-Based Training Approach: By employing Blender for synthetic data generation, the system learns object representations in NUNOCS. It overcomes the sim-to-real domain gap using domain randomization and depth-image alignment techniques, drastically reducing human effort and costs associated with physical data collection.

Experimental Setup and Results

The researchers conducted extensive real-world evaluations using two categories of objects—Gears and Batteries—in various manipulation tasks including standing tasks, assembly tasks, and high-precision insertion tasks. The approach consistently outperformed baseline methods across these tasks.

Battery Assembly Task: The proposed method achieved notable success in a challenging sequential task where precision and dynamic stability are crucial. Enhanced closed-loop control provided robustness against dynamics uncertainty from interactions with springs and receptacles.
Gear Insertion Task: Success varied with the tightness of fitting tolerance, demonstrating the method's impressive precision capabilities, even achieving success in extremely narrow tolerances.

Implications and Future Developments

The research has profound implications for automation in manufacturing and other industrial applications, where fast adaptation to new tasks and objects is crucial. The successful integration of visual feedback for dynamically varying tasks suggests potential improvements in robotic applications across unstructured environments. Future developments could explore further integration with tactile sensing for enhanced manipulation in contact-rich scenarios, as suggested by the failure modes encountered in the experiments.

Overall, the paper provides valuable insights into robotic manipulation, emphasizing the importance of adopting robust visual object representations and closed-loop control techniques. These advancements offer promising directions for teaching robots complex tasks through simplified, effective demonstrations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/YuXiang_IRVL/status/1801687861713854949

YouTube

Show All Videos