HumanoidBench: Benchmark for Robot Control

Updated 10 October 2025

HumanoidBench is a simulation benchmark that evaluates full-body locomotion and manipulation through 27 diverse tasks on a high-dimensional Unitree H1 humanoid robot.
It incorporates rich multimodal sensor streams, including proprioception, vision, and tactile feedback, enabling realistic coordination and long-horizon planning.
Hierarchical reinforcement learning strategies, by decoupling high- and low-level controls, demonstrate significant improvements over flat RL in overcoming high-dimensional challenges.

HumanoidBench is a high-dimensional simulated robot learning benchmark specifically designed for comprehensive evaluation of whole-body locomotion and manipulation algorithms on anthropomorphic platforms. Developed as a response to the hardware bottlenecks characteristic of physical humanoid robotics, HumanoidBench provides a safe, reproducible, and accessible testbed with a strong emphasis on realistic coordination, dexterous manipulation, and challenging long-horizon planning. It features the Unitree H1 humanoid equipped with dexterous hands, rendered in MuJoCo with 101 degrees of freedom, and supports multimodal sensor streams including proprioception, egocentric vision, and dense tactile feedback. The benchmark comprises a suite of 27 tasks that stress current controllers and algorithmic paradigms, and has catalyzed a wide array of reinforcement learning and control research innovations.

1. Simulated Humanoid Platform and Observation Structure

HumanoidBench employs a detailed simulation of the Unitree H1 humanoid, extended with two Shadow Hand–inspired 21-DoF dexterous manipulators, achieving a total of 101 DoFs and a 61-dimensional position-controlled action space (19 for the main body, 21 for each hand). Control executes at 50 Hz. Sensory modalities include:

State-based proprioception: 151-dimensional observation vector (joint positions/velocities).
Egocentric vision: two head-mounted RGB cameras.
Whole-body tactile sensing: 448 taxels distributed across limbs and torso.

The action space is normalized to $[-1, 1]^{61}$ , consistent with contemporary RL practice. The full sensor suite supports research into multimodal learning and sim-to-real transfer.

2. Benchmark Task Suite and Associated Control Challenges

HumanoidBench tasks are divided into 12 primitive locomotion skills (e.g., walk, stand, hurdle, maze, sit, pole, balance, stair, run, slide, crawl, reach) and 15 whole-body manipulation tasks (e.g., push, cabinet opening, box reorganization, in-hand cube manipulation, basketball catch-and-throw, kitchen assembly). The task set is characterized by:

Long-horizon objectives demanding sequential reasoning (e.g., multi-step truck unloading).
Complex contact dynamics (hands, feet, torso).
Intricate whole-body coordination (locomotion overlapped with manipulation).
Sparsity and delay of reward signals in some tasks.
High-dimensional, partially redundant action spaces, exacerbating the exploration problem.

Standard monolithic RL algorithms often struggle to achieve meaningful performance due to the compounding effects of high state/action dimensionality and challenging reward landscapes.

3. Algorithmic Performance: Flat RL, Hierarchies, and Recent Innovations

Several state-of-the-art model-free RL algorithms, including DreamerV3, Soft Actor-Critic (SAC), PPO, and TD-MPC2, have been benchmarked on HumanoidBench. Core findings include:

Pure end-to-end RL policies manage basic locomotion ("walk," "stand") only after millions of environment steps; performance on multimodal manipulation is poor.
High-dimensionality and redundant DoFs (e.g., unused fingers during walking) act as distractors, enlarging the exploration burden.
Sparse rewards exacerbate the difficulty of long-horizon manipulation challenges, with most baseline policies failing to discover consistent success strategies.

A hierarchical reinforcement learning paradigm is shown to be significantly more effective. In this setup:

A high-level policy outputs task-relevant waypoints or goals (e.g., 3D spatial targets), leveraging low-level solvers explicitly pretrained for primitive motion skills (e.g., “reach,” “walk”)—often via PPO and massive environment parallelization (e.g., MuJoCo MJX).
Decoupling high-level decision making from low-level motor control enables re-use of robust skills and shrinks the effective exploration space for the high-level controller.
Empirical results indicate hierarchical approaches achieve higher success rates for tasks such as push and multistep manipulation, relative to flat, monolithic baselines.

4. Reward Shaping and Task Formalization

Reward functions in HumanoidBench are based on a “tolerance” design, closely following the DeepMind Control Suite convention:

$R(s, a) = \text{stable} \times \text{tol}(v_x, (1, \infty), 1)$

Here, $\text{tol}(x, (\text{lower}, \text{upper}), \text{margin})$ outputs 1 when $x$ is within the specified interval, decaying smoothly otherwise. Individual tasks use weighted compositions of physical and geometric criteria (e.g., base height, velocity, hand-target distance), enabling reward feedback despite task horizon.

The hierarchical policy architecture is presented as a schematic (see Figure 1 in source), with an explicit setpoint interface from the high-level policy to the collection of robust, task-invariant lower-level controllers.

5. Limitations, Extensions, and Community Impact

HumanoidBench is intentionally constructed as an open-source, standardized simulation platform. Its defining features and intended uses include:

Isolation and controlled evaluation of new algorithmic ideas.
Task diversity enabling diagnosis of failures in whole-body coordination, contact reasoning, and multi-task generalization.
Support for verification of multi-modal integration (vision/proprioception/tactile), reward shaping effectiveness, and sim-to-real transfer strategies.

Resource efficiency (achieving simulation rates of several thousand FPS with task simplifications) underpins rapid empirical iteration. The benchmark methodology fosters prompt and reproducible comparison within the robotics community.

Proposed future directions include:

Integration of vision and tactile sensors into the agent’s primary observation space for sensor fusion studies.
Expanding the suite to include more intricate and real-world relevant tasks (e.g., furniture assembly, screw-driving).
Systematic support for domain randomization and sim-to-real pipelines using the MuJoCo MJX engine.
Exploration of imitation learning and learning-from-demonstration regimes as complements to pure RL, as well as further architectural advances in hierarchy and priors.

6. Comparative Positioning and Broader Implications

HumanoidBench is situated among a new class of high-dimensional, physics-accurate RL benchmarks intended to advance both robotics and embodied AI. Its focus on dexterous, whole-body control—encompassing both locomotion and fine manipulation with high DoF agents—distinguishes it from earlier RL benchmarks typically centered on either navigation or simple manipulation.

The benchmark catalyzes paper on several open problems:

Scaling RL in redundant, compound action spaces.
Developing structured policies (e.g., with latent or explicit hierarchies) that decompose multi-part skill acquisition.
Establishing methodologies for efficient skill/goal transfer and robust low-level controller design under sensor and actuation uncertainty.

With open-source code, reproducibility focus, and consistent evaluation protocols, HumanoidBench sets a rigorous foundation for ongoing research in robust, versatile, and scalable humanoid robot control.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to HumanoidBench.