Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

This presentation explores a breakthrough framework enabling humanoid robots to autonomously perform highly dynamic parkour maneuvers using only onboard depth sensing. By combining motion matching for skill composition, reinforcement learning-based motion tracking, and hybrid distillation techniques, the system achieves robust zero-shot transfer from simulation to hardware. The work demonstrates state-of-the-art agility on a physical humanoid robot, including complex wall climbs, vaults, and adaptive multi-obstacle traversals, establishing a scalable recipe for perception-driven whole-body control in contact-rich environments.
Script
Can a humanoid robot execute a parkour wall climb in under 4 seconds, using only what it sees through a depth camera? The researchers behind this work answer with a resounding yes, introducing a framework that chains dynamic human skills to create adaptive, whole-body parkour entirely onboard.
Building on that vision, let's examine the core challenge this work addresses.
Traditionally, humanoid parkour faces 4 interlocking challenges. Robots must execute contact-rich movements with split-second precision, compose atomic skills into seamless chains, adapt perceptually to unpredictable environments, and transfer policies from simulation to hardware without manual reward engineering.
The authors tackle these obstacles with a modular, scalable pipeline.
At the foundation is motion matching, which retrieves the most compatible next frame from a library of human parkour motions. This nearest-neighbor approach elegantly densifies sparse demonstration data, generating diverse trajectories that cover different approach distances, speeds, and obstacle geometries without manual transition design.
Next, the framework trains privileged RL teacher policies on composed trajectories, leveraging full state information for rapid learning. These single-skill experts are then distilled into a unified student policy that operates on noisy depth inputs, using a hybrid objective that blends behavior cloning with reinforcement learning to ensure both kinematic fidelity and high-torque task completion.
This diagram captures the complete architecture. Motion matching generates kinematic references from atomic skills, RL teachers learn to track those references with privileged information, and a depth-based student learns from multiple teachers through hybrid distillation. The result is a policy that deploys zero-shot onto hardware, autonomously chaining parkour skills in real environments.
The true test comes when this pipeline meets physical hardware.
The Unitree G1 humanoid executes a high wall climb in under 4 seconds, scaling an obstacle that's 96 percent of its height. This isn't choreographed: the robot adapts online to terrain changes and obstacle perturbations, demonstrating closed-loop correction and smooth skill transitions across complex multi-obstacle courses.
Ablation studies reveal the necessity of each component. Dense motion-matched references are essential for timing-sensitive skills, pure behavior cloning stalls on contact-rich maneuvers, and the hybrid RL objective provides the episodic signal needed for robust execution. Compared to strong baselines, this framework establishes a clear performance advantage on high-obstacle, fast-transition tasks.
This work delivers a scalable, modular recipe for perception-driven humanoid parkour, proving that motion matching plus hybrid distillation can bridge the gap from sparse human demonstrations to adaptive, real-world agility. To dive deeper into the technical details and video demonstrations, visit EmergentMind.com.