Generalizable Humanoid Manipulation with 3D Diffusion Policies (2410.10803v2)

Published 14 Oct 2024 in cs.RO, cs.CV, and cs.LG

Abstract: Humanoid robots capable of autonomous operation in diverse environments have long been a goal for roboticists. However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills and the expensiveness of in-the-wild humanoid robot data. In this work, we build a real-world robotic system to address this challenging problem. Our system is mainly an integration of 1) a whole-upper-body robotic teleoperation system to acquire human-like robot data, 2) a 25-DoF humanoid robot platform with a height-adjustable cart and a 3D LiDAR sensor, and 3) an improved 3D Diffusion Policy learning algorithm for humanoid robots to learn from noisy human data. We run more than 2000 episodes of policy rollouts on the real robot for rigorous policy evaluation. Empowered by this system, we show that using only data collected in one single scene and with only onboard computing, a full-sized humanoid robot can autonomously perform skills in diverse real-world scenarios. Videos are available at \href{https://humanoid-manipulation.github.io}{humanoid-manipulation.github.io}.

Citations (8)

View on Semantic Scholar

Summary

The paper proposes iDP3, which eliminates camera calibration and segmentation constraints in humanoid manipulation.
It leverages egocentric 3D visual representations, scaled-up input, and a novel pyramid encoder for smoother policy outputs.
Empirical results on the Fourier GR1 robot demonstrate superior scene and object generalization, advancing autonomous manipulation.

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

The paper, titled "Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies," presents a method to enhance autonomous manipulation by humanoid robots in diverse, real-world environments. The core contribution of the paper is the development of the Improved 3D Diffusion Policy (iDP3), which significantly extends the capabilities of humanoid robots to perform various tasks in unstructured settings using only laboratory-collected data.

Background and Motivation

Historically, humanoid robots have struggled to generalize manipulation skills beyond a specific, controlled environment. One reason for this limitation is the reliance of traditional learning methods on camera calibration and precise point-cloud segmentation, which are not feasible for dynamic, mobile platforms such as humanoid robots. The introduction of 3D visuomotor policies represents a progressive step towards broadening these capabilities. However, existing policies like the 3D Diffusion Policy (DP3) are still constrained by these requirements, thus limiting their applicability.

Improved 3D Diffusion Policy (iDP3)

To address these challenges, the authors propose iDP3, a novel approach that leverages egocentric 3D visual representations. This method eliminates the need for camera calibration and point-cloud segmentation, making it suitable for deployment on humanoid robots. Several critical modifications distinguish iDP3 from its predecessors:

Egocentric 3D Visual Representations: By employing 3D representations in the camera frame, iDP3 circumvents the inflexible world frame representation and its corresponding calibration constraints.
Scaled-Up Vision Input: The method includes significantly increasing the number of sample points, which helps in dealing with noisy or extraneous point clouds.
Improved Visual Encoder: A novel pyramid convolutional encoder replaces the MLP encoder used in previous methods, offering smoother policy outputs.
Extended Prediction Horizon: Adjusting the prediction horizon helps mitigate the effects of short-term prediction noise, resulting in improved output stability.

Implementation and Results

The paper utilizes a full-sized humanoid robot, Fourier GR1, equipped with advanced sensors, to implement iDP3. A newly designed teleoperation system aids in robust data collection from humans, despite facing latency issues. The system facilitates learning of humanoid manipulation skills by translating human demonstrations into actionable robot tasks.

Empirical evaluations demonstrate that iDP3 substantially outperforms traditional image-based methods and the base DP3, both in accuracy and generalization, across diverse scenes. Specifically, iDP3 exhibits strong scene and object generalization, and remarkable view invariance. These properties underscore its practical value for real-world applications.

Conclusion and Future Directions

The successful deployment of iDP3 highlights a significant step forward in enabling autonomous humanoid robots to operate in unstructured and unpredictable environments. However, addressing issues such as high-quality data scaling and sensor noise remains critical for further advancements. The research paves the way for exploring more data-efficient policies and advanced sensor technologies, and for integrating whole-body control mechanisms to broaden the suite of tasks that humanoid robots can perform autonomously.

The paper sets a foundational approach, suggesting promising avenues for future research, including leveraging pre-trained 3D models to enhance policy performance and integrating advanced capabilities for whole-body humanoid control in various environments.

PDF Markdown

Related Papers

GitHub

Humanoid Manipulation

Tweets

https://twitter.com/arXivGPT/status/1846674396519252142

YouTube

Show All Videos