Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 52 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 13 tok/s Pro

GPT-4o 100 tok/s Pro

Kimi K2 192 tok/s Pro

GPT OSS 120B 454 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining (2506.16475v1)

Published 19 Jun 2025 in cs.RO, cs.AI, and cs.LG

Abstract: Quadrupedal robots have demonstrated impressive locomotion capabilities in complex environments, but equipping them with autonomous versatile manipulation skills in a scalable way remains a significant challenge. In this work, we introduce a cross-embodiment imitation learning system for quadrupedal manipulation, leveraging data collected from both humans and LocoMan, a quadruped equipped with multiple manipulation modes. Specifically, we develop a teleoperation and data collection pipeline, which unifies and modularizes the observation and action spaces of the human and the robot. To effectively leverage the collected data, we propose an efficient modularized architecture that supports co-training and pretraining on structured modality-aligned data across different embodiments. Additionally, we construct the first manipulation dataset for the LocoMan robot, covering various household tasks in both unimanual and bimanual modes, supplemented by a corresponding human dataset. We validate our system on six real-world manipulation tasks, where it achieves an average success rate improvement of 41.9% overall and 79.7% under out-of-distribution (OOD) settings compared to the baseline. Pretraining with human data contributes a 38.6% success rate improvement overall and 82.7% under OOD settings, enabling consistently better performance with only half the amount of robot data. Our code, hardware, and data are open-sourced at: https://human2bots.github.io.

Collections

Summary

Human2LocoMan: A Framework for Versatile Quadrupedal Manipulation

The paper "Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining" introduces an innovative approach to overcoming the persistent challenge of equipping quadrupedal robots with diverse manipulation skills using a scalable learning method. The principal contribution of this work is the development of Human2LocoMan, a framework designed to leverage human demonstrations for the training of quadrupedal robots, enabling them to perform intricate manipulation tasks efficiently.

Framework Design

The Human2LocoMan system innovatively bridges human demonstrations and robot training, employing a unified teleoperation system enhanced by extended reality (XR) technology. This allows seamless mapping of human actions—collected through an XR headset—to the action space of LocoMan, a quadrupedal robot equipped with versatile manipulation capabilities. By capturing whole-body movements and aligning observation and action spaces between humans and robots within a unified coordinate frame, the system provides a robust data collection pipeline conducive to imitation learning.

Technical Architecture

The core of Human2LocoMan is the Modularized Cross-embodiment Transformer (MXT), a specialized Transformer-based architecture. MXT features a modular design that enables efficient cross-embodiment learning while accommodating inherent differences in data modalities across embodiments. This modular architecture comprises specific tokenizers and detokenizers tailored to different modalities, facilitating positive transfer of skills from human demonstrations to robot manipulation policies. The MXT policy is first pretrained with human data to learn relevant manipulation patterns and subsequently finetuned using a smaller amount of robot data, showcasing effective knowledge transfer across embodiments.

Empirical Validation

The empirical results affirm the efficacy of Human2LocoMan in enhancing the manipulation capabilities of quadrupedal robots. Tested across six household manipulation tasks—including unimanual and bimanual modes—the framework registered notable improvements in task success rates. On average, the MXT-trained policy improved success rates by 41.9% overall and 79.7% under out-of-distribution settings compared to baseline models. Human pretraining notably yielded a 38.6% improvement overall and 82.7% under OOD conditions, proving instrumental in achieving robust performance using limited robot data. These results underscore Human2LocoMan's potential in not only enabling versatile manipulation but also ensuring scalability and generalization across diverse tasks and object distributions.

Practical and Theoretical Implications

Practically, Human2LocoMan presents a promising pathway for scalable robot training, significantly lowering data collection and computational costs while broadening the scope of robotic applications in complex environments. Theoretically, the framework challenges existing paradigms in robot learning by demonstrating the effectiveness of cross-embodiment knowledge transfer and modularity in deep learning architectures, paving the way for further exploration into scalable multi-embodiment learning systems.

Future Directions

Future research could explore extending the Human2LocoMan framework to other robotic platforms like humanoid robots and robotic arms, assessing its scalability across different physical embodiments. Moreover, incorporating large-scale heterogeneous robotic datasets could provide additional insights into the framework’s robustness and adaptability, further advancing the domain of robotic learning.

In conclusion, "Human2LocoMan" sets a new benchmark in quadrupedal robot learning, leveraging human demonstrations to expand the robotic manipulation horizon while addressing scalability and efficiency—core challenges in the field of robot autonomy.