MultiDex Grasping Dataset
- The dataset provides a large-scale, force-closure validated repository with over 436k dexterous grasps across varied robotic hand types and object configurations.
- It employs a rigorous, physics-based, contact-centric sampling pipeline with multi-modal annotations, including SE(3) pose, joint angles, and object geometry.
- Applications include vision-to-grasp mapping, representation learning, and sim-to-real transfer, enabling hand-agnostic evaluation and cross-hand generalization.
The MultiDex Grasping Dataset is a comprehensive, large-scale repository of robotic grasp demonstrations specifically designed to support the development, benchmarking, and generalization of dexterous grasp synthesis algorithms across varied robotic hands. MultiDex encompasses diverse object and hand kinematic configurations and is referenced in multiple foundational works in the grasping literature, notably "GenDexGrasp: Generalizable Dexterous Grasping" (Li et al., 2022) and "Towards a Multi-Embodied Grasping Agent" (Freiberg et al., 31 Oct 2025). Its purpose is to provide standardized, force-closure validated grasp data and rich multi-modal annotations to enable generalizable, hand-agnostic learning methods for robotic grasping.
1. Scope and Dataset Composition
MultiDex aggregates a wide spectrum of grasp samples, hand configurations, and object types, making it a benchmark for generalizable manipulation research.
- Total Grasp Instances: 436,000 valid dexterous grasps in (Li et al., 2022); 20,000,000 grasps across 25,000 scenes in (Freiberg et al., 31 Oct 2025).
- Hand and Gripper Diversity:
- EZGripper (2-finger parallel) — 2 DoF
- Barrett Hand (3-finger underactuated) — 8 DoF
- Robotiq-3F, Allegro Hand (3/4-fingered dexterous) — 12/16 DoF
- Shadow Hand (5-finger, anthropomorphic) — 20–22 DoF
- Additional: DEX-EE, Franka Emika Panda, ViperX 300s, and others
- Object Set:
- 58–1,000+ daily-use household and industrial objects (YCB, ContactDB, Google Scanned Objects)
- Meshes in .ply/.obj format, scale-normalized for consistent grasping statistics
Table: Hand Types and Dataset Size
| Hand Model | Fingers | DoF | # Grasps | Objects (min) |
|---|---|---|---|---|
| EZGripper | 2 | 2 | ~85k | 58 |
| Barrett | 3 | 8 | ~87k | 58 |
| Robotiq-3F | 3 | 6-8 | ~87k | 58 |
| Allegro | 4 | 16 | ~87k | 58 |
| ShadowHand | 5 | 20-22 | ~87k | 58 |
Objects are split into fixed train/test subsets for standardized evaluation (e.g., 48 train, 10 test in (Li et al., 2022)). The more recent (Freiberg et al., 31 Oct 2025) version notably features 25,000 cluttered scenes with up to 7 objects each and up to five grippers per scene.
2. Data Annotation: Modalities and Representation
MultiDex provides multi-modal, structured storage for each grasp instance to facilitate diverse modes of learning and evaluation.
- Hand Pose: Parameterized by SE(3) end-effector pose and a full configuration vector of joint angles , with determined by hand model.
- Object Geometry: Triangular mesh per instance; sampled vertex sets and surface normals; contact regions specified.
- Contact Map : For each grasp, an object-centric continuous-valued per-vertex contact map is computed:
This provides a hand-agnostic, surface-attentive signal for learning transferable grasp descriptors.
- Grasp Quality Metrics: Differentiable force-closure (dfc) scores calculated via
with acting on contact normals at contact sites .
- Stability Flags: Binary indicator derived from physics testing under 6-axis disturbance in simulation (Isaac Gym with 0.5 m/s² for 1s, success if displacement cm).
3. Sampling and Quality Control Methodologies
Grasp synthesis in MultiDex is defined by a rigorous, physics-based, and contact-centric sampling pipeline.
- Force-closure optimization: MALA (Metropolis-adjusted Langevin Algorithm) sampling minimizes a composite energy function integrating force closure, joint limit, and penetration penalties:
with enforcing joint bounds and penalizing mesh penetration.
- Physical Validation: Grasps are only recorded if they pass no-collision placement, force-closure, and post-lifting disturbance. No failure samples are retained in the public release.
- Hand-agnostic sampling: By leveraging contact maps and surface-aligned metrics, learned grasp representations are decoupled from any specific hand’s kinematics, allowing cross-hand and cross-object generalization (Li et al., 2022).
- Dataset-level Filtering: Scenes/objects with insufficient grasp yield are pruned to maintain uniform data density for learning (Freiberg et al., 31 Oct 2025).
4. Data Organization, Splits, and Access
MultiDex data are organized for high-throughput loading and flexible batching.
- Directory Structure: Data are split by hand and object:
MultiDex/hands/HandName/Object/samples/. Each grasp is a NumPy.npzfile containing- "q_global": 6D pose,
- "q_joint": joint angles,
- "verts", "normals": mesh arrays,
- "contact_map": per-vertex float,
- "dfc": scalar score.
- Unified Scene Format (Freiberg et al., 31 Oct 2025): Each
.npzscene containspoint_cloud: float32[15,000x3]object_ids: object indices,grasps: float32[#grasps x (7 + D_g)],- gripper-specific kinematic YAML.
- Annotation Example (Python):
1 2 3 4 |
import numpy as np data = np.load("MultiDex/hands/ShadowHand/apple/samples/00001.npz") qg, qj = data["q_global"], data["q_joint"] verts, cmap = data["verts"], data["contact_map"] |
- Split Files: Standard train/test object lists provided (e.g.
train_objects.json,test_objects.json) for statistical benchmarking.
5. Gripper Kinematics and Cross-Hand Generalization
Each robotic hand in MultiDex is fully specified by standard Denavit–Hartenberg or PoE kinematic models, supporting precise forward kinematics for arbitrary SE(3) joint configurations. The fundamental pose calculation uses the product-of-exponentials formulation:
where are the joint values and are the joint twists for link (Freiberg et al., 31 Oct 2025).
This explicit, modular model of hand kinematics underpins the generalizability of learning algorithms, enabling flow-based and diffusion-based synthesis on the product manifold for handling arbitrary gripper types and degrees of freedom (Freiberg et al., 31 Oct 2025).
6. Benchmarking, Metrics, and Reference Results
MultiDex benchmarking protocols employ standardized experimental setups.
- Success Rate: Fraction of physically validated grasps meeting displacement and lift criteria, reported per hand–object–split configuration.
- Diversity: Joint-angle standard deviation on a test set (e.g., rad for ShadowHand (Li et al., 2022)).
- Comparison Table (ShadowHand, test-set, (Li et al., 2022)):
| Method | Generalizable | Success (%) | Diversity (rad) | Time (s) |
|---|---|---|---|---|
| dfc [Liu et al.] | ✓ | 79.53 | 0.344 | >1800 |
| GraspCVAE | ✗ | 19.38/22.03* | 0.340/0.355 | 0.012/43.2 |
| UniGrasp | ✓ | 80.0† | 0.000 | 9.33 |
| GenDexGrasp | ✓ | 77.19 | 0.207 | 16.42 |
- Inference Speed: Modern flow-based data splits enable <0.1 s per grasp on a 12 GB GPU (Freiberg et al., 31 Oct 2025).
7. Usage and Applications
MultiDex is designed for reproducible research and benchmarking in dexterous, generalizable grasp synthesis:
- Vision–to–grasp learning: Train CNNs or point-cloud networks for image-to-grasp mapping using multi-modal annotations.
- Representation learning: Contact-map and pose embeddings for hand-agnostic grasp reasoning.
- Algorithm evaluation: Standardized train/test splits enable objective comparison of generalization, diversity, and efficiency.
- Cross-hand transfer: Contact map intermediates enable hand-to-hand generalization without retraining.
- Sim-to-real transfer: Physically validated grasps facilitate adaptation to hardware via fine-tuning.
MultiDex and associated codebases for GenDexGrasp and multi-embodiment diffusion/flow-based models are released under open licenses, supporting direct integration into research pipelines (Li et al., 2022, Freiberg et al., 31 Oct 2025).
MultiDex Grasping Dataset thus provides the robotic manipulation research community with a rigorously annotated, physically validated, and hand-diverse resource for benchmarking and advancing generalizable dexterous grasping algorithms. Its force-closure-centric design, object/hand diversity, and hand-agnostic contact representations underpin its continued influence in embodied intelligence research.