ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection (2405.03666v1)

Published 6 May 2024 in cs.RO and cs.AI

Abstract: Bimanual manipulation is a longstanding challenge in robotics due to the large number of degrees of freedom and the strict spatial and temporal synchronization required to generate meaningful behavior. Humans learn bimanual manipulation skills by watching other humans and by refining their abilities through play. In this work, we aim to enable robots to learn bimanual manipulation behaviors from human video demonstrations and fine-tune them through interaction. Inspired by seminal work in psychology and biomechanics, we propose modeling the interaction between two hands as a serial kinematic linkage -- as a screw motion, in particular, that we use to define a new action space for bimanual manipulation: screw actions. We introduce ScrewMimic, a framework that leverages this novel action representation to facilitate learning from human demonstration and self-supervised policy fine-tuning. Our experiments demonstrate that ScrewMimic is able to learn several complex bimanual behaviors from a single human video demonstration, and that it outperforms baselines that interpret demonstrations and fine-tune directly in the original space of motion of both arms. For more information and video results, https://robin-lab.cs.utexas.edu/ScrewMimic/

Citations (6)

View on Semantic Scholar

Summary

The paper presents ScrewMimic, which abstracts complex bimanual human motion into screw actions to enable efficient robotic imitation.
It employs a deep learning model based on PointNet and a self-supervised fine-tuning loop to reliably predict and refine screw actions.
Experimental results on tasks like bottle opening and stirring illustrate the practical potential for advancing dual-arm robotic manipulation.

Understanding "ScrewMimic": Learning Bimanual Robot Manipulation from Human Videos

The Quest for Bimanual Coordination

Bimanual manipulation, using both arms in synchronized fashion to perform tasks, represents a significant challenge in robotics. The complexity arises from managing the large number of motion possibilities and ensuring precise synchronization in time and space between the two arms.

The paper introduces "ScrewMimic", a system developed to teach robots bimanual tasks by imitating human actions from video inputs. The system focuses on a simplified representation of motion called screw actions which condense the complex movement of two arms into manageable parameters that define how one arm moves in relation to the other.

Key Concepts: Screw Theory and Screw Actions

Screw Theory emerges as a cornerstone of the proposed method. It simplifies the understanding of 3D movements whereby any complex motion can be broken down into rotations and translations along a screw axis. This forms the basis for Screw Actions, which are defined by parameters like the screw axis and grasping points, encapsulating all movements of both arms into a single framework.

Operational Breakdown of ScrewMimic

ScrewMimic operates in several stages:

Extraction from Human Demonstration: The system first interprets an RGB-D video of a human performing a bimanual task, using machine vision techniques to track the movements and interactions of the hands with objects. It then abstracts these movements into a screw action by identifying the axis and type of screw motion involved.
Prediction Model: Leveraging deep learning, specifically a network based on PointNet, ScrewMimic predicts screw actions from new scenes and object configurations. This prediction facilitates the robot's initial attempt at replicating the task using its arms.
Self-Supervised Fine-Tuning: Not all initial attempts are successful due to discrepancies between human and robot morphologies and potential inaccuracies in action prediction. ScrewMimic refines its strategies through a self-supervised process, assessing each attempt based on specific performance metrics and adjusting the action model towards better outcomes.

Trial Results and Insights

The practical application of ScrewMimic demonstrated its ability to successfully complete several bimanual tasks such as opening a bottle, zipping a jacket, and stirring contents in a pot. The experiments highlighted the effectiveness of screw action representations in reducing the complexity of the task and enabling efficient learning and fine-tuning from video demonstrations.

A critical element in the success of ScrewMimic was the use of a self-improving loop where the robot could iteratively enhance its performance and adapt its strategies based on autonomous feedback.

Looking Forward: Implications and Speculations

The research provides a compelling method for advancing robot autonomy in handling complex bimanual tasks, suggesting broader applications in industrial and domestic settings where dual-arm coordination can enhance efficiency and effectiveness.

Future developments might focus on improving the robustness and accuracy of movement interpretation from human demonstrations, further bridging the gap between human flexibility and robotic precision. Additionally, integrating more advanced forms of learning and reasoning could enable robots to take on increasingly sophisticated and sensitive tasks that rely on delicate manipulations.

In conclusion, while ScrewMimic offers a promising step toward sophisticated robotic manipulation mimicry of human activities, the full potential of these techniques will unfold as they are refined and integrated into more complex systems, pushing the boundaries of what autonomous robots can achieve in real-world scenarios.

Related Papers

Tweets

https://twitter.com/ArpitBahety/status/1787683372703142136

https://twitter.com/OWW/status/1788039924337266944