Robosuite Dataset: Robotic Manipulation Benchmarks
- Robosuite dataset is a multimodal collection featuring nine standardized robotic manipulation tasks integrated with the MuJoCo physics engine.
- It provides reproducible, configurable streams of sensor, state, and action data for both scripted demonstrations and reinforcement learning experiments.
- The dataset supports various data modalities, including low-dimensional proprioception and high-dimensional vision, ensuring robust performance benchmarking.
The robosuite dataset is a multimodal, on-demand collection protocol built around a standardized suite of nine robotic manipulation benchmarks. It is tightly integrated with the robosuite simulation and benchmarking framework, which operates atop the MuJoCo physics engine. Rather than shipping static demonstration files, robosuite establishes a reproducible, extensible environment specification—the “dataset” consists of (1) parametrizable benchmark tasks; (2) configurable streams of sensor, state, and action data from simulated environments; and (3) tools for episodic data recording, serialization, and benchmarking within a rigorous evaluation protocol (Zhu et al., 2020).
1. Benchmark Environments and Task Scope
robosuite v1.0 provides nine benchmark environments (“tasks”), divided into single-arm and two-arm settings. Each environment exposes a Gym-style API—reset(), step()—with environment initializations sampled stochastically by a placement_initializer, ensuring varied, collision-free configurations on each episode reset.
Single-Arm Tasks
- Block Lifting: 7-DoF arm (default Panda), single 0.05 m cube on tabletop; (x, y) cube position randomized in 0.1 m disk in front of gripper; success if .
- Block Stacking: Single arm, two identical cubes; randomized, non-colliding starting positions; success when cube A is stably placed on cube B.
- Pick-and-Place: Single arm, up to four objects and four receptacles; randomized subset-object and receptacle assignments; single-object variants provided.
- Nut Assembly: Single arm, two pegs and two nuts; randomized nut locations; goal is correct insertion on each peg; single-nut variants supported.
- Door Opening: Single arm, hinged door with cylindrical handle; random pose (translation plus yaw) of door; success for opening angle .
- Table Wiping: Arm with eraser end-effector; whiteboard tabletop with randomized smear regions; must clear of marked area for success.
Two-Arm Tasks
- Two-Arm Lifting: Bimanual (Panda×2 or Sawyer×2); pot with two handles; random pot (x, y); arms start co-located or opposite; lift pot above threshold while keeping pitch and roll .
- Two-Arm Peg-In-Hole: Bimanual; square-holed board (arm1) and peg (arm2); random end-effector poses; insert peg through board.
- Two-Arm Handover: Bimanual; hammer object, randomized size and location; arm1 grasps and passes hammer to arm2; success if hammer ends in gripper2.
All environments utilize per-task placement samplers for stochasticity and reproducibility.
| Task | Default Robot(s) | Object(s) / Objects | Start-State Variations |
|---|---|---|---|
| Block Lifting | Panda (7-DoF Arm) | Unit cube | (x, y) in 0.1 m disk |
| Block Stacking | Single arm | Two cubes | Non-colliding, randomized (x, y) |
| Pick-and-Place | Single arm | Up to 4 objects, 4 bins | Object/container assignment, subsets |
| Nut Assembly | Single arm | 2 pegs, 2 nuts | Nut positions randomized |
| Door Opening | Single arm | Hinged door | Door pose (translation+yaw) randomized |
| Table Wiping | Single arm, eraser | Surface “whiteboard” | Smear regions randomized |
| Two-Arm Lifting | Two arms | Pot with handles | Pot (x, y), arm starting pose |
| Two-Arm Peg-In-Hole | Two arms | Board w/ hole, peg | Random EE poses |
| Two-Arm Handover | Two arms | Hammer | Hammer size, pose randomized |
2. Data Modalities and Observational Structure
robosuite’s observation and sensor suite is modular and configurable at environment instantiation, supporting low-dimensional proprioception, rich vision, and extensible sensor channels.
Proprioceptive (Low-Dim) Modalities
- Joint positions
- Joint velocities
- Joint torques
- End-effector position and orientation or
- Force/torque at wrist
- Pose for each object : ,
Vision
- RGB images: (one or more cameras)
- Depth:
Other Channels
- Finger/contact pressures (where implemented)
- Custom MuJoCo sensor outputs
Observations in the low-dim setting (use_object_obs=True, use_camera_obs=False) are concatenated vectors:
Activating camera-based modes appends obs['rgb_c0'], obs['depth_c0'], etc., to the observation dictionary.
3. Data Generation and Collection Protocols
Data is generated dynamically—no static demonstration archives are provided. Two canonical sources are supported:
Scripted / Demonstration Data
- Human teleoperation using SpaceMouse or keyboard provides action commands.
- Data streams are recorded at control frequency (default 20 Hz).
- Data are saved per user code as
.npz(NumPy) or HDF5 files.
Reinforcement Learning Data
- Off-policy methods (e.g., SAC) generate trajectory rollouts (e.g., steps, $500$ epochs).
- Evaluation episodes are interleaved at set intervals, matching task horizons.
Reproducibility
env.reset(seed)seeds MuJoCo RNG and initializers for consistent stochasticity.- Benchmarks report performance as mean ± std over 5 seeds; all code and hyperparameters are versioned with the benchmark suite.
4. Data Storage, Access, and Serialization
robosuite emphasizes online data instantiation and flexible serialization:
- Environments specified in
/robosuite/envs/, assets in/robosuite/models/. - Demonstration scripts (e.g.,
/examples/record_demonstrations.py) support human demonstration recording to.npz. - Benchmarking and data logging scripts are found under
/benchmark/. - Data output formats:
np.savez()(NumPy archives) or HDF5 (viah5py).
Example Workflow
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import robosuite as suite from robosuite.controllers import load_controller_config import numpy as np controller_cfg = load_controller_config(default_controller="OSC_POSE") env = suite.make( "PickPlace", robots="Panda", use_camera_obs=False, has_renderer=False, has_offscreen_renderer=False, reward_shaping=True, controller_configs=controller_cfg, control_freq=20, horizon=500, ) obs = env.reset(seed=0) episode = {'obs': [], 'act': [], 'rew': [], 'info': []} done = False while not done: action = policy.predict(obs) # user policy next_obs, reward, done, info = env.step(action) episode['obs'].append(obs) episode['act'].append(action) episode['rew'].append(reward) episode['info'].append(info) obs = next_obs np.savez('traj_seed0.npz', obs=np.array(episode['obs']), act=np.array(episode['act']), rew=np.array(episode['rew']), info=episode['info']) |
5. Evaluation Metrics and Benchmarking Protocol
Task-specific success criteria, diverse reward shaping, and standardized reporting ensure consistent benchmarking:
Success Criteria
- Boolean task-specific flags in
info['success']at ordone=True - Examples:
- Block Lifting:
- Block Stacking: and orientation error
- Table Wiping:
Reward Functions
robosuite supports both sparse and dense reward schemes, e.g.,
Orientation penalties can use quaternion geodesics,
Composite shaping may weight terms for position, orientation, and binary grasp success.
Evaluation Protocols
- 20 evaluation episodes per checkpoint
- Task horizon: 500 steps (25 s at 20 Hz)
- Reported metrics: Mean return ± std (5 seeds), success rate (% success episodes)
- Visualization: learning curves, bar charts for success rates
6. Usage Guidance and Reproducibility Best Practices
robosuite prescribes methodological practices for robust data collection and benchmarking:
- Use impedance-based OSC_POSE controllers for sample-efficient learning.
- Set random seeds via
env.reset(seed); record these in logs and metadata. - Prefer low-dim state for sim-to-real; enable
use_camera_obsanduse_offscreen_rendererfor vision. - Collect sparse rewards initially for exploration; introduce dense shaping incrementally.
- Align experiment horizons with baseline (≥500 steps).
- Record episodic data in compressed
.npzor HDF5; maintain code/hyperparameter provenance. - Typical workload: 20 Hz control, 12 GB RAM, 2 days for 5 seeds × 9 tasks × 500 epochs × 500 steps (no GPU required unless using vision-based policies).
- Community collaboration and updates are facilitated via the robosuite repository and robosuite.ai.
The robosuite dataset and its collection protocol operationalize a reproducible, extensible standard for robotic learning research, allowing researchers to instantiate, instrument, and analyze a wide array of manipulation tasks within a unified framework (Zhu et al., 2020).