RoboTwin-OD: 3D Object Dataset

Updated 9 April 2026

RoboTwin-OD is a large-scale, semantically and functionally annotated 3D object dataset designed for dual-arm robotic manipulation research.
It leverages scalable 3D reconstruction, mesh merging, and integration of third-party assets to compile 731 instances across 147 semantic categories.
The dataset supports sim2real transfer through unified annotation protocols, enhancing policy evaluation and performance in manipulation benchmarks.

The RoboTwin Object Dataset (RoboTwin-OD) is a large-scale, semantically and functionally annotated library of 3D objects designed to support robust dual-arm (bimanual) robotic manipulation research. Developed as an integral part of the RoboTwin 2.0 and RoboTwin benchmark frameworks, RoboTwin-OD addresses key limitations in existing synthetic manipulation datasets by providing a scalable, richly annotated, and diversified collection of objects with unified protocols for sim2real transfer and policy evaluation. Its construction leverages scalable 3D reconstruction, integration of third-party assets, and meticulous annotation of both semantic and interaction-relevant properties (Chen et al., 22 Jun 2025, Mu et al., 17 Apr 2025).

1. Dataset Composition and Structure

RoboTwin-OD consists of 731 unique 3D object instances organized into 147 semantic categories. The dataset is stratified as follows:

111 categories (534 instances) generated in-house: These are produced via multi-view RGB scan-based 3D reconstruction on the Rodin platform, followed by convex decomposition and mesh-merging techniques to ensure physical realism and simulation compatibility.
27 categories (153 instances) from Objaverse: Imported to enrich visual, geometric, and semantic diversity, enhancing scenarios with distractor objects and increasing domain robustness.
9 articulated object categories (44 instances) from SAPIEN PartNet-Mobility: These instances enhance coverage of jointed and power-tool objects, facilitating learning and evaluation of articulated and compound object interactions.

Each semantic category contains between 1 and more than 20 variants, ensuring both broad coverage and intra-class diversity relevant for domain randomization and policy generalization (Chen et al., 22 Jun 2025).

2. Annotation Schema and Metadata

Every object in RoboTwin-OD is furnished with comprehensive semantic and manipulation-relevant annotations:

Semantic Labels:
- class_id, class_name
- Fifteen natural language descriptions per object, capturing properties such as color, material, texture, shape, and function (e.g., "red soda bottle", "white ceramic mug with handle").
- Category-level affordance tags (e.g., "pourable", "stackable", "articulated_hinge").
Manipulation-Relevant Labels:
- placement_points: 3D coordinates indicating feasible resting poses on flat surfaces.
- functional_points: Surface keypoints identifying operationally salient features (e.g., lid, handle base).
- grasping_points: 3D coordinates supporting stable two-finger or parallel-jaw grasps.
- grasp_axes: Unit vectors defining canonical approach directions for various robot embodiments (e.g., top-down for Franka, side-grasp for Piper robots).
- collision_mesh: Physically accurate, convex-decomposed collision mesh for efficient contact and physics simulation.

Annotations are stored in per-instance metadata.json files, facilitating efficient parsing and integration into manipulation pipelines (Chen et al., 22 Jun 2025, Mu et al., 17 Apr 2025).

3. Generation Methodology

RoboTwin-OD employs a multi-pronged asset generation process, combining in-house model acquisition and external dataset import:

In-house Generation: Each of the 534 in-house models is reconstructed from multi-view RGB scans using the Rodin platform, resulting in watertight triangle meshes. These are then subject to convex decomposition to create simulation-ready collision hulls.
Objaverse and PartNet-Mobility Integration: Selected by both category coverage and semantic complementarity, external assets are post-processed for mesh cleaning and annotation harmonization.
Post-Processing: All models undergo automated and manual checks to verify and sample grasp axes, place points, and ensure alignment of affordance annotations. Categories with visually/structurally similar instances are manually grouped to support structured domain randomization.

The methodology extends to feature-matching and LLM-assisted annotation for accurate spatial annotation transfer (e.g., keypoint transfer based on diffusion model features) (Chen et al., 22 Jun 2025, Mu et al., 17 Apr 2025, Mu et al., 2024).

4. Data Properties, Formats, and Access

RoboTwin-OD is distributed in a hierarchical directory structure with standardized file formats:

objects/
  ├─ 001_bottle/
  │    ├─ 001_bottle_0.glb          # High-poly mesh
  │    ├─ 001_bottle_0_collision.obj # Collision mesh
  │    ├─ 001_bottle_0_metadata.json # Annotation metadata
  │    ├─ textures/…                 # Texture files
  │    ├─ descriptions.txt           # 15 descriptions
  ├─ ...

Each object instance includes:

High-resolution visual mesh (.glb)
Convex-decomposed collision mesh (.obj)
Manipulation/semantic metadata (metadata.json)
Optional textures and natural language description files

Bounding boxes are defined as

$B = [x_{\min}, y_{\min}, z_{\min}] \;\text{to}\; [x_{\max}, y_{\max}, z_{\max}]$

over all mesh vertices. Optional fields support assignment of mass/inertia (uniform density or manual) for detailed simulation. All assets are licensed under CC BY 4.0 and available via HuggingFace and the RoboTwin documentation site (Chen et al., 22 Jun 2025).

5. Benchmark Integration and Protocols

RoboTwin-OD is integral to the RoboTwin and RoboTwin 2.0 benchmark suites for dual-arm robotic manipulation, supporting evaluation across both simulated and real-world domains:

Task Population: Each benchmark task samples a specific subset of RoboTwin-OD models, enabling both broad and fine-grained task variation (e.g., hammer-beat, dual-bottle pick, block handover).
Simulated & Real Benchmarks:
- Simulation: Utilizes ManiSkill3 and SAPIEN with multi-view RGB-D sensing.
- Real-world: Validated on the COBOT Magic Robot system via teleoperation.
Annotations: Spatial relation labels (keypoints, axes) and bounding boxes enable functionally aligned task parameterization and code generation, supporting spatial constraint specification within LLM-generated expert policies.
Evaluation Metrics: Standardized success rates (percentage of completed episodes), placement error (mean/std Euclidean distance to goal), and collision rates are employed.

Task protocols ensure no repetition of object designs between training and test splits, supporting zero-shot or generalization studies (Chen et al., 22 Jun 2025, Mu et al., 17 Apr 2025, Mu et al., 2024).

6. Access, Usage Guidelines, and Example Code

RoboTwin-OD is compatible with mainstream simulation and robotics pipelines. Example Python code using the RoboTwin-OD loader demonstrates mesh loading, annotation access, and visual grasp axis inspection with trimesh and json libraries:

from robotwin_od import ObjectDataset
import trimesh, json

ds = ObjectDataset("/path/to/objects/")
obj = ds.get_object(category="bottle", instance_id=0)
vis_mesh = trimesh.load(obj.visual_path)
col_mesh = trimesh.load(obj.collision_path)
meta = json.load(open(obj.metadata_path))
placement_pts = meta["placement_points"]
grasp_axes = meta["grasp_axes"]

for pt, axis in zip(placement_pts, grasp_axes):
    scene = trimesh.Scene([vis_mesh])
    scene.add_geometry(trimesh.creation.axis(origin=pt, direction=axis, scale=0.05))
    scene.show()

This infrastructure enables seamless integration of object assets, semantic and functional queries, as well as affordance-driven grasp planning for both simulated and deployed robotic systems (Chen et al., 22 Jun 2025).

7. Empirical Impact and Performance

Empirical studies utilizing RoboTwin-OD as pre-training and benchmarking substrate demonstrate substantial improvements in sim2real policy transfer and dual-arm manipulation competence:

Policies pre-trained with RoboTwin-OD simulated trajectories and fine-tuned on limited real-world data achieve mean success rates of approximately 72% (single-arm) and 62% (dual-arm), compared to 1.2% and 20%, respectively, with exclusively real data (20 demos/task).
Fine-tuning protocol recommendations include data augmentation (brightness adjustment) and consistent object pose/size randomization.
Models fine-tuned on RoboTwin-OD achieve a 367% relative improvement (42.0% vs. 9.0%) on unseen real-world tasks for VLA models, and up to a 228% relative zero-shot transfer gain, underlining dataset effectiveness for generalization and robust manipulation (Chen et al., 22 Jun 2025, Mu et al., 17 Apr 2025, Mu et al., 2024).

8. Licensing and Download

All data, code, and generation pipelines for RoboTwin-OD are distributed under the CC BY 4.0 license. Assets, metadata, and usage documentation are accessible online: