Humanoid Policy ~ Human Policy (2503.13441v2)

Published 17 Mar 2025 in cs.RO, cs.AI, and cs.CV

Abstract: Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning. We mitigate the embodiment gap between humanoids and humans from both the data and modeling perspectives. We collect an egocentric task-oriented dataset (PH2D) that is directly aligned with humanoid manipulation demonstrations. We then train a human-humanoid behavior policy, which we term Human Action Transformer (HAT). The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. Co-trained with smaller-scale robot data, HAT directly models humanoid robots and humans as different embodiments without additional supervision. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency. Code and data: https://human-as-robot.github.io/

Summary

The paper proposes a novel framework leveraging the PH 2D dataset of VR-collected human manipulation demonstrations and the Human Action Transformer (HAT) model to train humanoid robot policies.
Real-robot evaluations demonstrate that incorporating human data enhances the robustness and generalization of learned policies across diverse manipulation tasks and objects not seen during training.
The research suggests human data can be central to forming robust, generalized robot control policies and opens avenues for future work incorporating broader human skills and multi-modal inputs.

Insightful Analysis of Humanoid Policy Learning from Human Demonstrations

The paper "Humanoid Policy $\sim$ Human Policy" explores the intersection of human data and humanoid robot policy learning. The focus is on mitigating the significant challenges present in scaling up data collection for humanoid robots by leveraging human demonstrations. This research proposes a novel framework where task-oriented egocentric human data serves as a cross-embodiment training interface, enhancing the efficiency and generalizability of robotic manipulation policies.

Key Contributions

Data Collection with PH $^2$ D: The authors introduce the Physical Human-Humanoid Data (PH $^2$ D) dataset, which encompasses large-scale, task-oriented human manipulation demonstrations collected using consumer-grade Virtual Reality (VR) devices. This dataset stands out for its scale and the accuracy of 3D hand-finger pose data, addressing the need for extensive, realistic training inputs without reliance on modular perception systems.
Human Action Transformer (HAT): The researchers develop the Human Action Transformer (HAT), which unifies the state-action space of humanoids and humans. The HAT is capable of retargeting human actions to robotic endpoints using inverse kinematics, enabling end-to-end policy training. This architecture significantly bridges the embodiment gap by treating bimanual human manipulations as a template for humanoid actions.
Generalization and Robustness: Through real-robot evaluations across diverse manipulation tasks, the paper demonstrates that including human data in training schedules enhances both the robustness and generalization of learned policies. The proposed methods exhibit superior performance in environments and with objects that were not part of the training set, underscoring the strengths of human-derived data in learning systems.

Implications and Theoretical Underpinnings

This research positions PH $^2$ D as a pivotal resource in cross-embodiment policy training, pushing the boundaries of how effectively humanoid robots can learn manipulation tasks through human demonstrations. The implications extend to various domains, such as more adaptive household robots and potentially evolving collaborative robots in industrial settings. Theoretically, the paper suggests a paradigm shift where human data is not just auxiliary but central to forming robust, generalized control policies for robots.

Considerations and Challenges

One of the primary challenges discussed involves addressing the difference in embodiment dynamics between humans and robots. The paper navigates this through careful retargeting and by slowing down human action sequences to match the capabilities of robotic systems. The paper also emphasizes collecting human demonstrations with minimized whole-body movements to better align with the current mechanical capabilities of humanoid robots.

In addition, the paper discusses the computational implications of scaling training across diverse data sources and the potential for integrating more sophisticated models, such as large-scale language-conditioned policies, which could further enhance the contextual understanding and adaptability of humanoid robots.

Future Directions

The research opens avenues for continued exploration in using human data to train robots. Future work could involve expanding the dataset to include a broader array of gestural and non-verbal communication skills, thereby enhancing the emotional and social intelligence of robots. Furthermore, exploring methods that exploit multi-modal inputs—including language and tactile information—and incorporating them into frameworks like HAT could yield even more robust and versatile humanoid systems.

Overall, this paper lays a strong foundation for future research in the domain of humanoid learning, positing human behavior not just as an inspiration but as a direct informant of robotic function.

Related Papers

Find Related Papers

GitHub

Human Policy
GitHub - RogerQi/human-policy (7 stars)

Tweets

https://twitter.com/TairanHe99/status/1902820242562355513

https://twitter.com/GuanyaShi/status/1903153231296970937

https://twitter.com/RogerQiu_42/status/1902736827553038501

https://twitter.com/semisance/status/1902264272656216570

YouTube

Show All Videos