MuJoCo Menagerie: Diverse Robotic Simulations

Updated 29 November 2025

MuJoCo Menagerie is a curated collection of simulated robotic mechanisms featuring diverse morphologies and explicit XML-based specifications.
It enables detailed customization of simulation parameters and integration with standardized APIs for consistent benchmarking and reproducible research.
Benchmark studies using algorithms like DDPG, SAC, and TQC demonstrate rapid convergence and high success rates across manipulation tasks.

MuJoCo Menagerie is a curated collection of simulated robotic mechanisms designed to provide high-fidelity, diverse testbeds for reinforcement learning, generative modeling, and task-planning research. The suite includes articulated manipulators such as the Franka Emika Panda, quadrupeds, and other robots exhibiting a broad range of morphology and kinematic configurations. Menagerie environments are implemented on the MuJoCo physics engine, leveraging the XML-based model specification for explicit control over simulation parameters, and often integrated with standardized APIs such as Gymnasium Robotics for structured interaction and benchmarking (Xu et al., 2023, Lin et al., 22 Nov 2025, Nazarczuk et al., 4 Mar 2025).

1. Architectural Overview

The MuJoCo Menagerie suite is defined by its focus on kinematic and morphological diversity and direct inspection/modification via XML-based specifications. Each robot or manipulator is described by detailed kinematic chains, including varying numbers of revolute joints—ranging, for example, from three to eight for robot arms—and distinct link geometries. Critical simulation parameters such as friction coefficients, object masses, joint damping, and solver iterations are easily edited in the individual XML files, enabling targeted modifications and domain randomization.

Environments commonly expose a standardized Python interface (e.g., via Gymnasium Robotics’s MujocoRobotEnv) and support multi-goal observation dictionaries with the keys “observation”, “achieved_goal”, and “desired_goal” (Xu et al., 2023). For visual and physics-based reasoning, Menagerie environments may be tightly integrated with high-fidelity rendering and sensor stacks (e.g., Blender for photorealistic images in MuBlE (Nazarczuk et al., 4 Mar 2025), or action-conditioned deformation models for kinematics (Lin et al., 22 Nov 2025)).

2. Manipulation Task Suite: Franka Menagerie Environments

A central subset of MuJoCo Menagerie is the Franka Emika Panda manipulation environments (Xu et al., 2023). These are built on the DeepMind MuJoCo Franka model, which accurately reproduces the 7-DoF arm and 2-finger parallel gripper. Three representative environments (FrankaPush, FrankaSlide, FrankaPickAndPlace) encapsulate canonical tabletop manipulation benchmarks:

FrankaPush: The agent pushes an object along a planar, high-friction surface toward a sampled goal. Both object and target are constrained to the table; a success threshold $\epsilon \approx 0.05$ m and default sliding friction of 1.0 in MuJoCo.
FrankaSlide: The agent must slide a lightweight puck across a slippery surface (sliding friction $\approx 0.3$ ) into a goal region. Lower friction and mass accentuate dynamic, low-control scenarios, requiring longer training (up to $10^6$ steps).
FrankaPickAndPlace: The agent manipulates both arm and gripper to grasp and move objects, with goals sampled between table and elevated positions ( $p \approx 0.5$ ). Parameters such as object mass and damping are tuned to stabilize lifting and placement.

Actions are issued as desired velocity commands in 3D Cartesian space; Push and Slide lock the gripper and restrict actions to 3D, while PickAndPlace extends to a 4D action space including gripper opening/closing.

3. Observation, Reward, and Success Criteria

Menagerie manipulation environments employ a multi-goal observation paradigm, returning:

Cartesian position $p_{\rm ee}\in\mathbb{R}^3$ and linear velocity $\dot p_{\rm ee}\in\mathbb{R}^3$ of the end-effector.
Object pose $(p_{\rm obj}, \theta_{\rm obj})$ and velocity $(\dot p_{\rm obj}, \dot\omega_{\rm obj})$ .
For PickAndPlace, gripper finger width.
Achieved goal ( $p_{\rm obj}$ ) and desired goal ( $g\in\mathbb{R}^3$ ).

Reward functions support both sparse (binary) and dense (distance-based) feedback: $r_{\rm sparse}(x,g)= \begin{cases} 0,&\|\mathrm{achieved\_goal}-\mathrm{desired\_goal}\|<\epsilon,\ -1,&\text{otherwise} \end{cases}$

$r_{\rm dense}(x,g)=-\|\mathrm{achieved\_goal}-\mathrm{desired\_goal}\|$

Episodes are capped at $H=50$ steps; early termination/success is triggered when the achieved goal enters the threshold region.

4. Algorithmic Benchmarks and Quantitative Results

Menagerie tasks are benchmarked using off-policy RL algorithms, specifically DDPG, SAC (maximum-entropy policy gradient), and TQC (distributional SAC with truncated quantile critics) as implemented via Stable-Baselines3 (Xu et al., 2023). Hindsight Experience Replay (HER) augments experience with multi-goal relabeling. Key training parameters include buffer sizes ( $10^6$ ), batch sizes (512/2048), network architectures ([256,256,256] for Push/PnP; [512,512,512] for Slide), $\gamma=0.95$ , $\tau=0.05$ , and action noise $\mathcal{N}(0,0.2)$ .

Empirical success rates (median over three seeds) at convergence:

FrankaPush (500k steps): DDPG ~70%, SAC ~98%, TQC ~99%
FrankaSlide (1M steps): DDPG ~60%, SAC ~82%, TQC ~84%
FrankaPickAndPlace (500k steps): DDPG ~75%, SAC ~97%, TQC ~98%

SAC and TQC demonstrate rapid improvement and stability, with high success rates attained by 200k–800k steps for the respective tasks. The absence of explicit return curves suggests direct correlation between success and cumulative reward under sparse criteria.

5. Generative Modeling and Kinematics: ArticFlow on Menagerie

The Menagerie suite underpins advanced generative frameworks for articulated mechanism simulation, notably the ArticFlow two-stage flow-matching architecture (Lin et al., 22 Nov 2025). The dataset covers 17 robot arms and several quadrupeds, each sampled for 1,000 joint configurations, rendered as point clouds (4,096 train, 20,000 test points), and standardized via zero-padding and Denavit–Hartenberg normalization.

ArticFlow operates via:

Latent flow: Normalizing flow $v_\psi(y,t\mid Z_a)$ transports Gaussian noise to a shape code, conditioning on joint-angle actions.
Point flow: $u_\theta(X,t\mid Z_x,Z_a)$ morphs noisy point sets into action-conditioned articulated shapes.

Performance metrics include Chamfer Distance (CD) and Earth-Mover’s Distance (EMD) against MuJoCo ground truth. ArticFlow achieves per-robot CD $=0.16–0.62\times 10^{-3}$ and EMD $=4.4–11.2\times 10^{-3}$ (arms/quadrupeds), outperforming single-object neural simulators (VSM, CD $=12–15\times 10^{-3}$ , EMD $=50–70\times 10^{-3}$ ) and action-conditioned PointFlow baselines. The disentanglement of morphology (latent flow) and action-dependent kinematics (point flow) is substantiated by both quantitative and visual alignment fidelity.

6. Environment Parameters, Fidelity, and Reproducibility

Parameters such as object mass, friction, joint torque limits, target-sampling bounds, and success thresholds are directly user-modifiable within each task’s XML, facilitating controlled experimentation and robust reproduction. The Menagerie usage pattern prescribes uniform seeding, multi-goal API conventions, and standardized evaluation protocols (e.g., reporting over 15 episodes per evaluation, consistent random seeds).

The underlying Franka URDF and contact models are calibrated for realism, with explicit future work directed at tightening sim-to-real correspondence via further solver and contact model tuning. Full codebases and XML files are available in open source repositories, and identical training regimes yield reproducible learning curves under fixed configuration (Xu et al., 2023).

7. Applications and Extensions

MuJoCo Menagerie environments serve as canonical testbeds for RL algorithm development, function as data sources for generative modeling of complex kinematics, and provide standardized benchmarks for embodied agents in task planning and manipulation (Nazarczuk et al., 4 Mar 2025). The modularity of environment specification underpins integration with diverse frameworks (e.g., Gymnasium, robosuite, Blender-based rendering pipelines), extended to multimodal input modalities (RGB, depth, force/torque). Menagerie’s XML-driven customization and morphometric diversity support closed-loop reasoning, sim-to-real research, and scalable benchmarking in simulation-centric robotics.