Human–Humanoid Co-Manipulation via Haptic Intent Inference
The paper presents an innovative approach to human–humanoid cooperative tasks that rely on haptic feedback rather than visual cues, addressing challenges in inferring human intent and maintaining stable locomotion under variable payloads. The research introduces a hierarchical policy-learning framework named H2-COMPACT, designed to enable legged humanoid robots to jointly manipulate and transport objects with human counterparts. This development focuses on extracting and reacting to haptic cues, a modality which has historically been underutilized compared to visual inputs in humanoid robotics.
Hierarchical Policy-Learning Framework
The framework is structured in two tiers to efficiently manage the complex task of human-robot co-manipulation. The upper tier employs a behavior-cloning network that processes six-axis force/torque data from wrist-mounted sensors, translating these inputs into whole-body planar velocity commands that capture the human leader's intent. This method leverages a conditional diffusion-based model, integrating stationary wavelet transform for signal encoding, multi-scale Transformer encoders for improved intent inference, and deterministic DDIM sampling for decision making at inference.
The lower tier translates the extracted high-level velocity commands into joint trajectories through a deep reinforcement learning policy. Training occurs in simulated environments using Proximal Policy Optimization (PPO) under random payloads and friction, ensuring the model adapts to variable real-world conditions. Sim2real transfer was demonstrated using the Unitree G1 robot, highlighting the robustness and flexibility of the approach.
Numerical Results and Implications
In real-world scenarios, the system was tested against human-human cooperation baselines, with the humanoid achieving performance metrics indicative of effective cooperation: completion times and trajectory deviations comparable to blindfolded human followers and improved kinematic synchrony. Average follower force was less than the human-only setup, underscoring the system's capability to interpret and act on haptic cues effectively with reduced effort by the human leader.
These findings suggest practical implications for humanoid robots in environments where visual occlusion by payload or crowded spaces restricts traditional vision-driven interaction. The reliance on haptic feedback opens new doors for collaborative robotics applications in logistics or personal robotic assistants where shared physical tasks are required.
Theoretical Contributions and Future Directions
Theoretically, this work advances the understanding of haptic perception in humanoid robotics, demonstrating that force and torque can be as meaningful for intent inference as visual stimuli, and articulating a refined hierarchical system to process such inputs. By delineating haptic inference from locomotion control, the paper provides a robust framework adaptable to varied payload scenarios.
Looking forward, further developments could include expansion to multi-contact scenarios beyond dual wrist-mounted sensors, integration with visual cues for enhanced contextual awareness in unrestricted scenarios, and exploration into how this framework can be applied in other forms of robotic manipulation beyond humanoids.
The research presented in this paper reflects a significant step in leveraging underexplored haptic data for robotics, ensuring legged humanoids can participate fluidly in cooperative physical tasks alongside humans. Through adaptive controller design and careful consideration of the nuances in human-robot interaction, this work sets a foundation for future explorations into more intuitive and accessible human-robot collaboration strategies.