Papers
Topics
Authors
Recent
2000 character limit reached

Robosuite Dataset: Robotic Manipulation Benchmarks

Updated 20 December 2025
  • Robosuite dataset is a multimodal collection featuring nine standardized robotic manipulation tasks integrated with the MuJoCo physics engine.
  • It provides reproducible, configurable streams of sensor, state, and action data for both scripted demonstrations and reinforcement learning experiments.
  • The dataset supports various data modalities, including low-dimensional proprioception and high-dimensional vision, ensuring robust performance benchmarking.

The robosuite dataset is a multimodal, on-demand collection protocol built around a standardized suite of nine robotic manipulation benchmarks. It is tightly integrated with the robosuite simulation and benchmarking framework, which operates atop the MuJoCo physics engine. Rather than shipping static demonstration files, robosuite establishes a reproducible, extensible environment specification—the “dataset” consists of (1) parametrizable benchmark tasks; (2) configurable streams of sensor, state, and action data from simulated environments; and (3) tools for episodic data recording, serialization, and benchmarking within a rigorous evaluation protocol (Zhu et al., 2020).

1. Benchmark Environments and Task Scope

robosuite v1.0 provides nine benchmark environments (“tasks”), divided into single-arm and two-arm settings. Each environment exposes a Gym-style API—reset(), step()—with environment initializations sampled stochastically by a placement_initializer, ensuring varied, collision-free configurations on each episode reset.

Single-Arm Tasks

  • Block Lifting: 7-DoF arm (default Panda), single 0.05 m cube on tabletop; (x, y) cube position randomized in 0.1 m disk in front of gripper; success if zcube>hthreshz_{\text{cube}} > h_{\text{thresh}}.
  • Block Stacking: Single arm, two identical cubes; randomized, non-colliding starting positions; success when cube A is stably placed on cube B.
  • Pick-and-Place: Single arm, up to four objects and four receptacles; randomized subset-object and receptacle assignments; single-object variants provided.
  • Nut Assembly: Single arm, two pegs and two nuts; randomized nut locations; goal is correct insertion on each peg; single-nut variants supported.
  • Door Opening: Single arm, hinged door with cylindrical handle; random pose (translation plus yaw) of door; success for opening angle >π/4>\pi/4.
  • Table Wiping: Arm with eraser end-effector; whiteboard tabletop with randomized smear regions; must clear >90%>90\% of marked area for success.

Two-Arm Tasks

  • Two-Arm Lifting: Bimanual (Panda×2 or Sawyer×2); pot with two handles; random pot (x, y); arms start co-located or opposite; lift pot above threshold while keeping pitch and roll <ε<\varepsilon.
  • Two-Arm Peg-In-Hole: Bimanual; square-holed board (arm1) and peg (arm2); random end-effector poses; insert peg through board.
  • Two-Arm Handover: Bimanual; hammer object, randomized size and location; arm1 grasps and passes hammer to arm2; success if hammer ends in gripper2.

All environments utilize per-task placement samplers for stochasticity and reproducibility.

Task Default Robot(s) Object(s) / Objects Start-State Variations
Block Lifting Panda (7-DoF Arm) Unit cube (x, y) in 0.1 m disk
Block Stacking Single arm Two cubes Non-colliding, randomized (x, y)
Pick-and-Place Single arm Up to 4 objects, 4 bins Object/container assignment, subsets
Nut Assembly Single arm 2 pegs, 2 nuts Nut positions randomized
Door Opening Single arm Hinged door Door pose (translation+yaw) randomized
Table Wiping Single arm, eraser Surface “whiteboard” Smear regions randomized
Two-Arm Lifting Two arms Pot with handles Pot (x, y), arm starting pose
Two-Arm Peg-In-Hole Two arms Board w/ hole, peg Random EE poses
Two-Arm Handover Two arms Hammer Hammer size, pose randomized

2. Data Modalities and Observational Structure

robosuite’s observation and sensor suite is modular and configurable at environment instantiation, supporting low-dimensional proprioception, rich vision, and extensible sensor channels.

Proprioceptive (Low-Dim) Modalities

  • Joint positions qtRnq_t\in\mathbb{R}^{n}
  • Joint velocities q˙tRn\dot{q}_t\in\mathbb{R}^{n}
  • Joint torques τtRn\tau_t\in\mathbb{R}^{n}
  • End-effector position pee,tR3p_{ee,t}\in\mathbb{R}^{3} and orientation Ree,tSO(3)R_{ee,t}\in SO(3) or qee,tS3q_{ee,t}\in S^3
  • Force/torque at wrist ft,μtR3f_t,\mu_t\in\mathbb{R}^{3}
  • Pose for each object ii: pi,tp_{i,t}, Ri,tR_{i,t}

Vision

  • RGB images: Ic,trgb[0,255]H×W×3I^{\text{rgb}}_{c,t}\in [0,255]^{H\times W\times 3} (one or more cameras)
  • Depth: Ic,tdepthRH×WI^{\text{depth}}_{c,t}\in \mathbb{R}^{H\times W}

Other Channels

  • Finger/contact pressures (where implemented)
  • Custom MuJoCo sensor outputs

Observations in the low-dim setting (use_object_obs=True, use_camera_obs=False) are concatenated vectors:

ot=[qt;q˙t;pee,t;vec(Ree,t);{pi,t,vec(Ri,t)}i=1No]Rdo_t = [q_t;\,\dot{q}_t;\,p_{ee,t};\,\mathrm{vec}(R_{ee,t});\,\{p_{i,t},\,\mathrm{vec}(R_{i,t})\}_{i=1}^{N_o}] \in \mathbb{R}^d

Activating camera-based modes appends obs['rgb_c0'], obs['depth_c0'], etc., to the observation dictionary.

3. Data Generation and Collection Protocols

Data is generated dynamically—no static demonstration archives are provided. Two canonical sources are supported:

Scripted / Demonstration Data

  • Human teleoperation using SpaceMouse or keyboard provides action commands.
  • Data streams (ot,at,rt,donet)(o_t, a_t, r_t, \text{done}_t) are recorded at control frequency (default 20 Hz).
  • Data are saved per user code as .npz (NumPy) or HDF5 files.

Reinforcement Learning Data

  • Off-policy methods (e.g., SAC) generate trajectory rollouts (e.g., T=500T=500 steps, $500$ epochs).
  • Evaluation episodes are interleaved at set intervals, matching task horizons.

Reproducibility

  • env.reset(seed) seeds MuJoCo RNG and initializers for consistent stochasticity.
  • Benchmarks report performance as mean ± std over 5 seeds; all code and hyperparameters are versioned with the benchmark suite.

4. Data Storage, Access, and Serialization

robosuite emphasizes online data instantiation and flexible serialization:

  • Environments specified in /robosuite/envs/, assets in /robosuite/models/.
  • Demonstration scripts (e.g., /examples/record_demonstrations.py) support human demonstration recording to .npz.
  • Benchmarking and data logging scripts are found under /benchmark/.
  • Data output formats: np.savez() (NumPy archives) or HDF5 (via h5py).

Example Workflow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import robosuite as suite
from robosuite.controllers import load_controller_config
import numpy as np

controller_cfg = load_controller_config(default_controller="OSC_POSE")
env = suite.make(
    "PickPlace",
    robots="Panda",
    use_camera_obs=False,
    has_renderer=False,
    has_offscreen_renderer=False,
    reward_shaping=True,
    controller_configs=controller_cfg,
    control_freq=20,
    horizon=500,
)
obs = env.reset(seed=0)
episode = {'obs': [], 'act': [], 'rew': [], 'info': []}
done = False
while not done:
    action = policy.predict(obs)  # user policy
    next_obs, reward, done, info = env.step(action)
    episode['obs'].append(obs)
    episode['act'].append(action)
    episode['rew'].append(reward)
    episode['info'].append(info)
    obs = next_obs
np.savez('traj_seed0.npz', obs=np.array(episode['obs']), act=np.array(episode['act']),
         rew=np.array(episode['rew']), info=episode['info'])

5. Evaluation Metrics and Benchmarking Protocol

Task-specific success criteria, diverse reward shaping, and standardized reporting ensure consistent benchmarking:

Success Criteria

  • Boolean task-specific flags in info['success'] at tendt_\text{end} or done=True
  • Examples:
    • Block Lifting: zcubeztable>δhz_{\text{cube}} - z_{\text{table}} > \delta_h
    • Block Stacking: pApB[0,0,hcube]<εpos\|p_A - p_B - [0,0,h_{\text{cube}}]\| < \varepsilon_{\text{pos}} and orientation error <εori<\varepsilon_{\text{ori}}
    • Table Wiping: cleaned_areatotal_area>0.9\frac{\text{cleaned\_area}}{\text{total\_area}} > 0.9

Reward Functions

robosuite supports both sparse and dense reward schemes, e.g.,

rt=pee,tptarget2r_t = -\|p_{ee,t} - p_{target}\|^2

Orientation penalties can use quaternion geodesics,

rtori=Log(qee,tqee,t)2r^{\text{ori}}_t = -\|\mathrm{Log}(q_{ee,t}^* \otimes q_{ee,t})\|^2

Composite shaping may weight terms for position, orientation, and binary grasp success.

Evaluation Protocols

  • 20 evaluation episodes per checkpoint
  • Task horizon: 500 steps (25 s at 20 Hz)
  • Reported metrics: Mean return ± std (5 seeds), success rate (% success episodes)
  • Visualization: learning curves, bar charts for success rates

6. Usage Guidance and Reproducibility Best Practices

robosuite prescribes methodological practices for robust data collection and benchmarking:

  • Use impedance-based OSC_POSE controllers for sample-efficient learning.
  • Set random seeds via env.reset(seed); record these in logs and metadata.
  • Prefer low-dim state for sim-to-real; enable use_camera_obs and use_offscreen_renderer for vision.
  • Collect sparse rewards initially for exploration; introduce dense shaping incrementally.
  • Align experiment horizons with baseline (≥500 steps).
  • Record episodic data in compressed .npz or HDF5; maintain code/hyperparameter provenance.
  • Typical workload: 20 Hz control, 12 GB RAM, 2 days for 5 seeds × 9 tasks × 500 epochs × 500 steps (no GPU required unless using vision-based policies).
  • Community collaboration and updates are facilitated via the robosuite repository and robosuite.ai.

The robosuite dataset and its collection protocol operationalize a reproducible, extensible standard for robotic learning research, allowing researchers to instantiate, instrument, and analyze a wide array of manipulation tasks within a unified framework (Zhu et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Robosuite Dataset.