Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 50 tok/s Pro

GPT-5 Medium 21 tok/s Pro

GPT-5 High 19 tok/s Pro

GPT-4o 91 tok/s Pro

Kimi K2 164 tok/s Pro

GPT OSS 120B 449 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies (2505.17627v1)

Published 23 May 2025 in cs.RO

Abstract: We present a hierarchical policy-learning framework that enables a legged humanoid to cooperatively carry extended loads with a human partner using only haptic cues for intent inference. At the upper tier, a lightweight behavior-cloning network consumes six-axis force/torque streams from dual wrist-mounted sensors and outputs whole-body planar velocity commands that capture the leader's applied forces. At the lower tier, a deep-reinforcement-learning policy, trained under randomized payloads (0-3 kg) and friction conditions in Isaac Gym and validated in MuJoCo and on a real Unitree G1, maps these high-level twists to stable, under-load joint trajectories. By decoupling intent interpretation (force -> velocity) from legged locomotion (velocity -> joints), our method combines intuitive responsiveness to human inputs with robust, load-adaptive walking. We collect training data without motion-capture or markers, only synchronized RGB video and F/T readings, employing SAM2 and WHAM to extract 3D human pose and velocity. In real-world trials, our humanoid achieves cooperative carry-and-move performance (completion time, trajectory deviation, velocity synchrony, and follower-force) on par with a blindfolded human-follower baseline. This work is the first to demonstrate learned haptic guidance fused with full-body legged control for fluid human-humanoid co-manipulation. Code and videos are available on the H2-COMPACT website.

Summary

Human–Humanoid Co-Manipulation via Haptic Intent Inference

The paper presents an innovative approach to human–humanoid cooperative tasks that rely on haptic feedback rather than visual cues, addressing challenges in inferring human intent and maintaining stable locomotion under variable payloads. The research introduces a hierarchical policy-learning framework named H $^2$ -COMPACT, designed to enable legged humanoid robots to jointly manipulate and transport objects with human counterparts. This development focuses on extracting and reacting to haptic cues, a modality which has historically been underutilized compared to visual inputs in humanoid robotics.

Hierarchical Policy-Learning Framework

The framework is structured in two tiers to efficiently manage the complex task of human-robot co-manipulation. The upper tier employs a behavior-cloning network that processes six-axis force/torque data from wrist-mounted sensors, translating these inputs into whole-body planar velocity commands that capture the human leader's intent. This method leverages a conditional diffusion-based model, integrating stationary wavelet transform for signal encoding, multi-scale Transformer encoders for improved intent inference, and deterministic DDIM sampling for decision making at inference.

The lower tier translates the extracted high-level velocity commands into joint trajectories through a deep reinforcement learning policy. Training occurs in simulated environments using Proximal Policy Optimization (PPO) under random payloads and friction, ensuring the model adapts to variable real-world conditions. Sim2real transfer was demonstrated using the Unitree G1 robot, highlighting the robustness and flexibility of the approach.

Numerical Results and Implications

In real-world scenarios, the system was tested against human-human cooperation baselines, with the humanoid achieving performance metrics indicative of effective cooperation: completion times and trajectory deviations comparable to blindfolded human followers and improved kinematic synchrony. Average follower force was less than the human-only setup, underscoring the system's capability to interpret and act on haptic cues effectively with reduced effort by the human leader.

These findings suggest practical implications for humanoid robots in environments where visual occlusion by payload or crowded spaces restricts traditional vision-driven interaction. The reliance on haptic feedback opens new doors for collaborative robotics applications in logistics or personal robotic assistants where shared physical tasks are required.

Theoretical Contributions and Future Directions

Theoretically, this work advances the understanding of haptic perception in humanoid robotics, demonstrating that force and torque can be as meaningful for intent inference as visual stimuli, and articulating a refined hierarchical system to process such inputs. By delineating haptic inference from locomotion control, the paper provides a robust framework adaptable to varied payload scenarios.

Looking forward, further developments could include expansion to multi-contact scenarios beyond dual wrist-mounted sensors, integration with visual cues for enhanced contextual awareness in unrestricted scenarios, and exploration into how this framework can be applied in other forms of robotic manipulation beyond humanoids.

The research presented in this paper reflects a significant step in leveraging underexplored haptic data for robotics, ensuring legged humanoids can participate fluidly in cooperative physical tasks alongside humans. Through adaptive controller design and careful consideration of the nuances in human-robot interaction, this work sets a foundation for future explorations into more intuitive and accessible human-robot collaboration strategies.