Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 34 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 115 tok/s Pro
Kimi K2 175 tok/s Pro
GPT OSS 120B 474 tok/s Pro
Claude Sonnet 4 40 tok/s Pro
2000 character limit reached

GMT: General Motion Tracking for Humanoid Whole-Body Control (2506.14770v1)

Published 17 Jun 2025 in cs.RO

Abstract: The ability to track general whole-body motions in the real world is a useful way to build general-purpose humanoid robots. However, achieving this can be challenging due to the temporal and kinematic diversity of the motions, the policy's capability, and the difficulty of coordination of the upper and lower bodies. To address these issues, we propose GMT, a general and scalable motion-tracking framework that trains a single unified policy to enable humanoid robots to track diverse motions in the real world. GMT is built upon two core components: an Adaptive Sampling strategy and a Motion Mixture-of-Experts (MoE) architecture. The Adaptive Sampling automatically balances easy and difficult motions during training. The MoE ensures better specialization of different regions of the motion manifold. We show through extensive experiments in both simulation and the real world the effectiveness of GMT, achieving state-of-the-art performance across a broad spectrum of motions using a unified general policy. Videos and additional information can be found at https://gmt-humanoid.github.io.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a unified policy that leverages adaptive sampling and a Mixture-of-Experts architecture to enhance humanoid motion tracking accuracy.
  • It refines motion datasets by filtering infeasible sequences and incorporating both immediate and future motion frames to capture short- and long-term trends.
  • Experimental results on the Unitree G1 robot demonstrate reduced joint and key body positional errors compared to existing methods, highlighting its superior performance.

Overview of General Motion Tracking for Humanoid Whole-Body Control

The paper presents "GMT: General Motion Tracking for Humanoid Whole-Body Control", a methodical framework enabling humanoid robots to imitate a diverse range of human-like movements using a single unified policy. This research addresses the challenges inherent in motion tracking for humanoids, such as temporal and kinematic diversity, policy limitations, and the complexity of coordinating the upper and lower body. GMT integrates adaptive sampling and a motion Mixture-of-Experts (MoE) architecture to enhance policy proficiency and generalizability, demonstrating state-of-the-art performance in motion tracking across various skills.

Core Components and Methodology

Adaptive Sampling Strategy: To tackle the uneven distribution of motion categories in large datasets, adaptive sampling assigns higher probabilities to less frequent, more challenging motions. This strategy allows policies to focus on refining difficult skills, for which errors are larger, ensuring balanced and comprehensive learning.

Motion Mixture-of-Experts (MoE) Architecture: This element enhances the model's expressiveness by delegating different motion categories to specialized networks ('experts'). As the robot encounters various movement types, a gating network dynamically selects the most suitable experts, optimizing action outputs for the task at hand. This partitioning caters to motion diversity effectively, leading to improved tracking performance.

Data Curation and Input Design: The research refines motion datasets by filtering out infeasible and irrelevant sequences, ensuring that algorithms are trained on pertinent data. In particular, motion inputs incorporate both immediate and future motion frames to simultaneously capture short-term targets and long-term trends, boosting tracking fidelity.

Experimental Validation and Results

The framework is validated both in simulation and real-world settings, employing the Unitree G1 humanoid robot. Quantitative metrics such as mean per key body positional error and joint positional error underscore GMT's superior tracking performance compared to existing approaches like ExBody2. Adaptive sampling and MoE integration are shown to significantly ameliorate challenging motion tracking scenarios, with qualitative analyses illustrating effective expert specialization in complex, composite motions.

Practical and Theoretical Implications

The deployment of GMT on an actual humanoid platform showcases its capability to replicate various sophisticated human movements, indicating potential applications in domains where humanoid robots need adaptive and expressive motor control. From a theoretical perspective, the paper highlights the crucial role of balancing dataset biases and enhancing model expressiveness, offering promising directions for further research into motion imitation and humanoid control.

Future Directions

Addressing limitations such as the absence of contact-rich and terrain-diverse skills, future research could incorporate terrain-aware motion tracking and develop capabilities for complex interactions like falling and recovery. The exploration of integrating motion diffusion-generated sequences hints at the framework's potential for creative applications, suggesting broader implications for dynamic and autonomous humanoid functioning in varied environments.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube