HumanPlus: Humanoid Shadowing and Imitation from Humans

Published 15 Jun 2024 in cs.RO, cs.AI, cs.CV, cs.LG, cs.SY, and eess.SY | (2406.10454v1)

Abstract: One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn autonomous skills from egocentric vision. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. We demonstrate the system on our customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. Project website: https://humanoid-ai.github.io/

Abstract PDF HTML Upgrade to Chat

Citations (35)

View on Semantic Scholar

Summary

The paper introduces a novel framework that integrates reinforcement learning with behavior cloning to transfer human motion to autonomous humanoids.
The system uses 40 hours of human activity data and a Humanoid Shadowing Transformer to enable real-time tracking and teleoperation of a 33-DoF robot.
The approach achieves 60–100% task success rates across complex tasks, reducing completion times and enhancing robust performance over traditional methods.

HumanPlus: Humanoid Shadowing and Imitation from Humans

The paper "HumanPlus: Humanoid Shadowing and Imitation from Humans" presents a comprehensive system for leveraging human data to enhance humanoid robot capabilities. The primary objective is to address the challenges associated with transferring human motion and skills to humanoid robots, particularly in the context of perception, control, and morphology differences. This system encompasses the full spectrum from data collection to deployment of autonomous humanoid tasks, demonstrating significant advancements in humanoid robotics.

System Architecture and Methodology

The proposed HumanPlus system integrates several key components into a full-stack framework that includes shadowing and imitation capabilities for humanoid robots. Initially, the process involves training a low-level policy using reinforcement learning (RL) within a simulation. This policy leverages existing human motion datasets, encompassing 40 hours of diverse human activities, to develop a task-agnostic control mechanism known as the Humanoid Shadowing Transformer. This mechanism enables real-time tracking of human body and hand motions using a standard RGB camera, facilitating the teleoperation of humanoids through a process referred to as shadowing.

Human operators can utilize this shadowing capability to collect whole-body data, effectively enabling the humanoid to gather sensory inputs for various tasks in real-world environments. Utilizing the shadowing data, the system employs supervised behavior cloning to develop skill policies based on egocentric vision. The resulting model, referred to as the Humanoid Imitation Transformer, is capable of autonomously executing tasks by imitating captured human skills.

Implementation and Demonstrations

The system's capabilities are showcased on a customized 33-DoF, 180cm humanoid robot. This involves tasks such as wearing shoes to stand and walk, unloading objects from warehouse racks, folding sweatshirts, rearranging objects, typing, and greeting another robot. Success rates for these tasks range between 60-100%, with up to 40 demonstrations used for training.

The robot's hardware includes two egocentric RGB cameras for capturing visual data, along with dexterous hands, facilitating interactions closely aligned with human capabilities. Real-time human motion is estimated using WHAM and HaMeR methodologies for body and hand pose estimation.

Comparative Analysis and Numerical Results

Quantitative results reveal the superiority of this approach over several existing teleoperation methods. The system exhibits a reduction in task completion times, increased robustness in task execution, and enhanced stability during teleoperation compared to kinesthetic teaching, ALOHA, and meta-quest methods. The Humanoid Imitation Transformer demonstrates improved success rates, particularly in complex tasks like shoe-wearing and walking, compared to monocular versions and other baseline methods.

Theoretical and Practical Implications

Theoretically, this research contributes to the field of learning-based control systems by demonstrating the efficacy of transformers in modeling complex humanoid operations. The use of binocular perception and sim-to-real policy transfer enriches strategies for training robotic systems on diverse tasks.

Practically, the capability to autonomously perform complex tasks using minimal real-world demonstrations represents a significant step towards general-purpose humanoid robots. The approach enhances the adaptability of humanoids to function in human-centric environments, leveraging readily available human motion data.

Future Directions

Acknowledging limitations related to hardware constraints, retargeting fidelity, and egocentric vision challenges, the paper highlights areas for future exploration. Enhancing morphological compatibility, improving pose estimation under occlusions, and scaling to more comprehensive navigational tasks are prospective goals. These advancements aim to further the application of humanoids in more dynamic and diverse real-world settings.

In conclusion, "HumanPlus" exemplifies a robust integration of human data into humanoid robotics, setting a foundational approach for future exploration in autonomy and versatility of humanoid applications.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We found no open problems mentioned in this paper.

HumanPlus: Humanoid Shadowing and Imitation from Humans

Summary

HumanPlus: Humanoid Shadowing and Imitation from Humans

System Architecture and Methodology

Implementation and Demonstrations

Comparative Analysis and Numerical Results

Theoretical and Practical Implications

Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (5)

Collections

GitHub

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

HumanPlus: Humanoid Shadowing and Imitation from Humans

Summary

HumanPlus: Humanoid Shadowing and Imitation from Humans

System Architecture and Methodology

Implementation and Demonstrations

Comparative Analysis and Numerical Results

Theoretical and Practical Implications

Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections

GitHub

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research