Rig-XL: A 3D Rigging Benchmark Dataset

Updated 5 January 2026

Rig-XL is a large-scale dataset of rigged 3D models that provides standardized annotations, extensive category coverage, and benchmarks for automated skeletal rigging.
The dataset features normalized meshes, detailed skeleton trees with per-vertex skinning weights, and efficient tokenization schemes to streamline model processing.
Rig-XL employs rigorous filtering, augmentation, and evaluation protocols, enabling robust training and benchmarking for both academic research and industrial applications.

Rig-XL is a large-scale dataset of rigged 3D models developed to enable and benchmark automated skeletal rigging, specifically supporting the UniRig framework. Comprising 14,611 assets with annotated skeletons and per-vertex skinning weights, Rig-XL offers extensive category coverage for diverse object types, including humanoids, animals, inorganics, and non-standard topologies. Rig-XL addresses challenges encountered in previous datasets by providing standardized normalization, rigorous filtering, semantically rich categorization, and comprehensive annotation protocols, facilitating robust model training and evaluation for both academic and industrial rigging tools (Zhang et al., 16 Apr 2025).

1. Dataset Composition and Scope

Rig-XL consists of 14,611 rigged 3D models, each accompanied by:

Mesh geometry available in .obj, .fbx, and .glTF formats.
A single, connected skeleton tree (joint positions and parent indices).
Per-vertex skinning weights.
Optional bone attributes for physics-based “spring bones” where present.

The dataset was sourced primarily from Objaverse-XL, subject to extensive filtering, augmentation, and manual verification. All models are normalized to fit within a $[-1,1]^3$ unit cube in both geometry and skeleton coordinate space. Skeletons are structurally restricted to trees with 10 to 256 bones and a single connected component to ensure topological suitability for rigging systems.

2. Category Distribution and Topological Diversity

Rig-XL provides explicit coverage across eight semantically defined object categories, enabling evaluation and training on both standard and challenging topologies. Each model is assigned to a single category through automated captioning and classification:

Category	Proportion (%)	Description
Mixamo	≈25	Standard humanoid templates
Biped	≈20	Non-Mixamo two-legged characters
Quadruped	≈15	Four-legged animals
Bird/Flyer	≈10	Avian and flying forms
Insect/Arachnid	≈8	Multi-legged arthropods
Water Creature	≈7	Aquatic organisms
Static Objects	≈5	Inorganic/static (furniture, pillows, etc.)
Other	≈10	Unclassified or miscellaneous

Topological diversity is quantified by bone counts, with a primary mode at 52 bones (reflecting Mixamo full-body rigs) and a secondary mode at 28 bones (mainly Mixamo models lacking hand structures). The minimum and maximum number of bones per asset are 10 and 256, respectively (excluding outliers).

3. Annotation, Tokenization, and Data Representation

Rig-XL advances the annotation and representation of rigged models through several strategies:

Skeleton Tree Tokenization:

Skeletons are encoded into one-dimensional token sequences for autoregressive model training. Discretization bins joint coordinates in $[-1,1]$ to $D=256$ tokens via $M(x) = \lfloor (x+1)/2 \cdot D \rfloor$ and the inverse mapping

$M^{-1}(d) = 2d/D -1.$

Two tokenization schemes are published:

Naïve Sequence: Standard depth-first serialization (average length $\approx$ 266.28 tokens/model).
Optimized Tokenization: Incorporates class tokens (e.g., $<$ mixamo $>$ ), template chain recognition (Mixamo body and hand structures), spring-bone chain grouping (depth-first search), and branch sorting (descending tail coordinate order). This reduces sequence length to an average of 187.15 tokens/model, a $-29.7\%$ reduction.

File and Metadata Conventions:

Meshes are internally converted to uniform-density point clouds for processing ( $N=65,536$ for skeleton, $16,384$ for skinning).
Parent indices are stored 0-based. Root joint index is always 0.
Joint names retain author conventions for maximal compatibility (e.g., “mixamorig:Head”).
Bone connectivity and plausibility are enforced structurally and through filtering of exotic or malformed topologies.

4. Evaluation Metrics and Protocols

Rig-XL supports comprehensive quantitative benchmarking of rigging algorithms across multiple metrics:

Joint-to-Joint Chamfer Distance ( $CD_{J2J}$ ):

$CD_{J2J} = \frac{1}{|J_{pred}|} \sum_{p \in J_{pred}} \min_{g \in J_{gt}} \|p - g\|_2 + \frac{1}{|J_{gt}|} \sum_{g \in J_{gt}} \min_{p \in J_{pred}} \|g - p\|_2$

Joint-to-Bone (J2B):

Average distance from each predicted joint to its closest ground-truth bone segment.

Bone-to-Bone (B2B):

Symmetric Chamfer distance between points sampled along predicted and ground-truth bone segments.

Skinning Weight L1 Loss ( $L_{skin}$ ):

$L_{skin} = \frac{1}{N} \sum_{v=1}^N \|w_{pred}(v) - w_{gt}(v)\|_1$

where $w \in \mathbb{R}^J$ is the per-vertex skinning weight vector.

Motion Reconstruction L2 Loss ( $L_{motion}$ ):

$L_{motion} = \frac{1}{N} \sum_{v=1}^N \|X_{pred}^M(v) - X_{gt}^M(v)\|_2^2$

for each frame $M$ .

Benchmarking Protocols:

Main training set: $\sim$ 14,500 models; validation set: 100 uniformly sampled models (no overlap).
Additional VRoid validation set: 50 anime-style models.
Evaluations are conducted under geometric augmentations (random rotations $\pm$ 30^\circ $, scaling in$ [0.8,1.0] $, motion perturbations).</li> </ul> Comparative performance baselines include RigNet, <a href="https://www.emergentmind.com/topics/normalized-bures-similarity-nbs" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">NBS</a> (Neural Blend-Shapes), TA-Rig, and commercial software (Meshy, AnythingWorld, AccuRIG, Tripo). <h2 class='paper-heading' id='preprocessing-filtering-and-quality-control'>5. Preprocessing, Filtering, and Quality Control</h2> Rig-XL employs a five-stage curation pipeline: <ol> <li>Skeleton-Based Filtering: Retain assets with a single connected skeleton tree and 10–256 bones.</li> <li>Automated De-duplication and Categorization: Use perceptual hashing for duplicate removal; category assignment using a vision-LLM (ChatGPT-4o).</li> <li>Manual Verification: Visual inspection with overlaid skeletons; root topology issues repaired by minimum spanning tree reconnections.</li> <li>Training-Time Outlier Removal: During model training, models with reconstruction loss greater than$ 10\times$ the average are dynamically excluded.</li> <li>Normalization & Augmentation: Point cloud normalization to unit cube, random rotations, scaling, and motion perturbations applied to expand training data diversity.</li> </ol> <h2 class='paper-heading' id='design-challenges-and-limitations'>6. Design Challenges and Limitations</h2> Several data quality challenges are addressed: <ul> <li>Data sourcing: Objaverse-XL contains predominantly static, unrigged models; thus, Rig-XL is constrained to instances where both skeleton and skinning information are present.</li> <li>Structural anomalies: Filtering targets disconnected components, missing skinning weights, unconnected bones, and implausible skeletal structures (e.g., root out-degree $>4 $).</li> <li>Annotation precision: Manual and algorithmic intervention reduces erroneous topology and improves category coherence.</li> </ul> A plausible implication is that, due to reliance on upstream data quality and rigid filtering, some potentially valid exotic rigs may be excluded from Rig-XL. <h2 class='paper-heading' id='methodological-integration-and-use-cases'>7. Methodological Integration and Use Cases</h2> Rig-XL is the foundational dataset for UniRig, supporting autoregressive skeleton prediction and bone-point cross-attention skinning. Key equations used during training include: <ul> <li><a href="https://www.emergentmind.com/topics/next-token-prediction-ntp" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Next-Token Prediction</a> Loss:</li> </ul> $ \mathcal{L}_{NTP} = -\sum_{t=1}^T \log P(s_t \mid s_{ $ <ul> <li>Bone–Point Cross-Attention for Skinning:</li> </ul> $

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rig-XL Dataset.