UniDex-Dataset: Robotics, Grasping & Vision

Updated 30 March 2026

UniDex-Dataset for dexterous manipulation is a large-scale, robot-centric corpus featuring 52K trajectories with diverse hand morphologies and a human-in-the-loop retargeting pipeline.
The grasping dataset provides over one million validated, optimized grasps for a 26-DoF ShadowHand, leveraging collision checks and force closure metrics.
The cross-dataset visual testbed harmonizes multi-source image data with unified label ontologies, facilitating domain adaptation and bias analysis in visual recognition.

UniDex-Dataset refers to three distinct, high-profile datasets in robotics, computer vision, and semantics, each serving as a foundational resource in its respective domain. The term encompasses (1) UniDex-Dataset for dexterous robot hand manipulation (Zhang et al., 23 Mar 2026), (2) the UniDex dataset for universal dexterous grasping (Xu et al., 2023), and (3) the UniDex cross-dataset visual testbed (Tommasi et al., 2014). Each instance of "UniDex-Dataset" targets large-scale, standardized, and generalizable data for research and benchmarking, albeit with divergent modalities and annotation schemes.

1. UniDex-Dataset for Universal Dexterous Robotic Manipulation

The UniDex-Dataset (Zhang et al., 23 Mar 2026) is a large-scale, robot-centric corpus designed to support universal dexterous hand control. It targets the challenges of scaling dexterous manipulation—especially the scarcity of large-scale, high-fidelity robotic hand data and the heterogeneity of robotic hand morphologies.

1.1 Dataset Composition and Statistics

Source data: Derived from four egocentric, video-based human manipulation datasets (H2O, HOI4D, HOT3D, TACO), covering 51 diverse tool-use task categories.
Scale: Over 52,000 temporally coherent manipulation trajectories recorded at 30 fps, totaling 9 million image–pointcloud–action frames.
Robotic Hand Coverage: Includes trajectories retargeted to eight dexterous robotic hands, spanning 6–24 active DoFs:
- Inspire Hand, Leap, Oymotion, Ability, Allegro, Shadow Hand, Wuji Hand, and Xhand (custom morphology).
Scene and Task Diversity: 51 unique tabletop/kitchen setups; each task labeled by verb–object categories and accompanied by short language instructions.

Dataset Comparison Table

Dataset	#Traj	#Hands	#Lang. Tasks	#Scenes	RGB	Depth	Pointcloud
UniDex-Dataset	52 K	8	51	51	✔	✔	✔
ActionNet (2025)	30 K	2	51	55	✔	✔	low
RoboMind (2024)	19 K	1	55	55	✔	✔	✗
RealDex (2024)	2 K	2	51	55	✔	✔	✗

1.2 Data Capture and Format

UniDex-Cap: A portable rig comprising an Apple Vision Pro headset (for hand/head 6D pose tracking) and an Intel RealSense L515 camera (RGB-D at 30 fps), calibrated via a GUI to align hand skeletons and pointclouds.
Per-Frame Data Representation:
- RGB image $\mathbf{I}_t \in \mathbb{R}^{H \times W \times 3}$
- Depth map $\mathbf{D}_t \in \mathbb{R}^{H \times W}$
- Pointcloud $P_t \in \mathbb{R}^{N \times 6}$ (after human masking, with robot-hand mesh inserted)
- Proprioceptive state $q_t \in \mathbb{R}^n$ (robot joints in FAAS)
- Action $a_t \in \mathbb{R}^{82}$ (in Function–Actuator–Aligned Space, FAAS)

1.3 Human-in-the-Loop Retargeting Pipeline

Retargeting from human to robot embodiment proceeds in two integrated stages:

Kinematic Retargeting: Solves for robot joint configurations and a 6-DoF offset $T_{\mathrm{offset}}$ to minimize

$\mathcal{L}_{\mathrm{align}}(q, T_{\mathrm{offset}}) = \sum_{i=1}^m \| x_i(q; T_{\mathrm{offset}}) - x_i^\star \|^2$

subject to joint/mimic constraints enforced via PyBullet IK. Post-initial solution, a human expert adjusts $T_{\mathrm{offset}}$ sliders in a web GUI to refine contact plausibility, with re-application of IK per adjustment.

Visual Alignment: 2D hand segmentation (WiLoR + SAM2) and depth masking remove the human hand. The retargeted robot mesh is rendered into the pointcloud and RGB-D, producing "robot-only" synthetic observations.

1.4 Action Space: Function–Actuator–Aligned Space (FAAS)

FAAS: Encodes action/pose for all eight hand types in a shared, fixed 82-dimensional vector, mapping functionally similar actuators to common slots. Structure: 9D wrist pose (rotation + translation), 21 shared joints (e.g., finger pitch, pinch), with hand-specific/future extensions.

1.5 Design Insights

Explicit 3D pointclouds and hand masking are employed to close the sim-to-real visual gap and support occlusion-aware perception.
Human-in-the-loop retargeting (with minimal expert input) enables rapid corpus growth and transfer across hand morphologies.
The combination of large scale, high hand diversity, high-quality pointclouds, and associated language instructions makes this dataset a foundation for vision–language–action pretraining and universal hand policy learning.

2. UniDex Dataset for Universal Dexterous Grasping

The UniDex grasp dataset (Xu et al., 2023) addresses universal dexterous grasping by synthesizing over one million validated grasps in table-top settings for the 26-DoF ShadowHand.

2.1 Construction and Contents

Object Models: 5,519 CAD meshes (133 categories), normalized and scaled randomly. Each is decomposed for efficient collision/penetration checking.
Tabletop Scenes: Object is dropped under gravity onto a table, ShadowHand initialized in randomized "above-object" pose.
Grasp Generation: Grasp poses optimized for

$E = w_1 E_{\mathrm{fc}} + w_2 E_{\mathrm{dis}} + w_3 E_{\mathrm{pen}} + w_4 E_{\mathrm{tpen}} + w_5 E_{\mathrm{joints}} + w_6 E_{\mathrm{spen}}$

(Force-closure, finger-object distance, object/table/self-penetration, joint limits).

Validation: Static grasp must support the object against gravity along all six axes and have minimal penetration.

2.2 Structure and Storage

Per-object Data:
- objects/: category/object_id structure with mesh files and metadata.
- pointclouds/: .npz pointclouds (object/table labels).
- grasps/: .npz files containing quaternions (rotation; $M \approx 200$ per object), translation, joint configuration, $\mathcal{Q}_1$ (force closure metric), penetration depth.
- Dataset Size: ≈80 GB; splits: train/val/test over seen and unseen categories.

2.3 Evaluation Protocols

Proposal Metrics: Mean $\mathcal{Q}_1$ , mean penetration (cm), rotational/translation/joint angle/point variance.
Policy Metrics: Goal-conditioned policy $\pi(a \mid X_t, s_t^r, g)$ ; success is measured by lifting object 0.3m above table and within 0.05m of target.
Baselines: UniDex's proposal (GraspIPDF+GraspGlow+TTA) achieves $\mathcal{Q}_1 \approx 0.0423$ (seen) / $0.0322$ (unseen), $\sigma_R \approx 127^\circ$ (order-of-magnitude greater diversity than previous approaches). Policy attains 0.74 train / 0.66 test success, outperforming prior ILAD policy by at least 2.5x.

2.4 File Format and Usage

Data available as .npz and .obj files for standard loading (e.g., via PyTorch).
Intended for direct integration into proposal and policy learning pipelines for dexterous robot grasping.

3. UniDex: Cross-Dataset Visual Recognition Testbed

The UniDex-Dataset described in (Tommasi et al., 2014) is a harmonized multi-source object-recognition corpus, created for large-scale analysis of dataset bias and domain adaptation.

3.1 Source Dataset Integration

Constituent Collections: Twelve well-known image datasets (ETH80, Caltech101/256, Bing, AwA, a-Yahoo, MSRCORID, PascalVOC07, SUN, Office, RGB-D, ImageNet).
Label Unification: Ontology aligned using WordNet synsets; duplicates and ambiguous categories resolved by manual inspection and cleaning.
Partitioning: Two merged variants provided—a "dense" corpus (four largest sources, 114 classes, ≃450K images), and a "sparse" corpus (all sources, 105 classes, ≃250K images).

3.2 Features and Evaluation Protocols

Shared Feature Repository:
- Dense-SIFT on normalized images, Bag-of-Visual-Words histograms (1,000 D).
- "Object-Classemes": 1,000 SVM-based concept detectors per image (trained on ILSVRC2010).
Domain-label Metadata: Every sample is tagged with original source, enabling systematic domain generalization experiments.
Recommended Protocol: Leave-one-dataset-out evaluation to measure generalization gap $\Delta(A \rightarrow B) = \mathrm{Acc}(A_\mathrm{train} \rightarrow A_\mathrm{test}) - \mathrm{Acc}(A_\mathrm{train} \rightarrow B_\mathrm{test})$ .

3.3 Intended Use

Enables controlled study of recognition and domain adaptation methods with quantifiable cross-dataset bias, supported by reproducible splits, features, and scripts.

Each UniDex-Dataset instance advances the state of its field by addressing specific generalization bottlenecks:

Dexterous Manipulation and Grasping: UniDex-Dataset (Zhang et al., 23 Mar 2026) surpasses prior robot datasets in scale and hand diversity. UniDex grasp (Xu et al., 2023) is the first with broad coverage and validated, high-diversity grasp proposals.
Cross-Dataset Benchmarks: UniDex (Tommasi et al., 2014) is a precursor to modern domain adaptation testbeds, offering explicit tools for SIFT/BoW and classeme-based representation comparisons.

5. Significance and Adoption

The UniDex-Dataset family provides the data backbone for:

Pretraining and evaluation of vision–language–action (VLA) robot controllers with strong spatial, object, and cross-hand generalization (Zhang et al., 23 Mar 2026).
Universal dexterous grasping research with robust diversity and transfer to unseen categories (Xu et al., 2023).
Systematic evaluation of cross-domain recognition algorithms and quantification of dataset bias in visual categorization (Tommasi et al., 2014).

The explicit design, protocol, and distribution choices position these datasets as benchmarks for scalability, reproducibility, and research rigor across robot manipulation, grasping, and computer vision.

6. Limitations and Future Directions

Limitations across the UniDex-Dataset family include:

Restriction to predefined hands or domains (e.g., English-only, fixed hand morphologies).
For the dexterous robot manipulation dataset, reliance on human-in-the-loop retargeting, which, while efficient, may introduce subtle artifacts or bottlenecks as task complexity grows.
Absence, in some cases, of full real-robot validation or domain-randomization across broader physics, sensory, or language conditions.

A plausible implication is that scaling to new platforms or morphologies may require further automation of the retargeting or integration process, as well as expansion of language and task diversity to approach "truly universal" manipulation policy pretraining.

Markdown Report Issue Upgrade to Chat

References (3)

UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos (2026)

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy (2023)

A Testbed for Cross-Dataset Analysis (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UniDex-Dataset.