Tex-Parts: Unified 3D Part Annotations
- Tex-Parts is a large-scale dataset providing densely labeled, semantically consistent 3D part annotations unified from multiple sources like PartNet, 3DCoMPaT++, and Find3D.
- It employs an efficient human-in-the-loop annotation pipeline with the ALIGN-Parts engine, achieving a 5–8× speedup in segmentation and naming.
- The dataset features a unified canonical ontology with 1,794 unique part names, enabling open-vocabulary, object-conditional segmentation and robust semantic evaluation.
Tex-Parts is a large-scale dataset providing densely labeled, semantically consistent 3D part annotations, derived from the unannotated TexVerse corpus using the ALIGN-Parts engine. Its design enables open-vocabulary, object-conditional 3D part segmentation and naming, generating per-point part masks and affordance-aware part names for thousands of richly annotated 3D models. Tex-Parts addresses inconsistencies in existing datasets' part definitions by merging PartNet, 3DCoMPaT++, and Find3D into a single, unified part ontology of 1,794 canonical names. Annotation is accelerated using a combination of automated proposals and human-in-the-loop verification, delivering a 5–8× speedup over manual methods while maintaining semantic precision (Paul et al., 19 Dec 2025).
1. Motivation and Objectives
Tex-Parts was conceived in response to persistent limitations in 3D part segmentation resources. Available benchmarks such as PartNet, 3DCoMPaT++, and Find3D each employ distinct part taxonomies and granularities, impeding cross-dataset training and transfer. Tex-Parts aims to fill this gap by offering:
- A large-scale, open-vocabulary 3D part dataset with labels consistent across object categories and grounded in human-centric affordance descriptions.
- Seamless integration of the ALIGN-Parts model to propose both geometric part segments ("partlets") and natural-language part names, supporting open-vocabulary and object-conditional annotation.
- Efficient, verifiable annotation: a human annotator validates or corrects a complete object's segmentation and naming in 3–5 minutes versus 15–25 minutes for traditional de novo annotation, yielding a speedup factor of 5–8×.
- A publicly released resource with per-point part masks and affordance-aware names, supporting scalable development and evaluation for semantic 3D part segmentation.
2. Unified Canonical Part Taxonomy
A central feature of Tex-Parts is its canonical part ontology, constructed by unifying the part vocabularies of PartNet, 3DCoMPaT++, and Find3D:
- Each original part name from these datasets is embedded using an MPNet sentence encoder.
- Agglomerative clustering in the embedding space identifies candidate merges (e.g., “microwave” vs. “microwave_oven”).
- High-similarity pairs are presented to a Gemini LLM prompt, which discerns true functional equivalence or distinction; equivalent names are collapsed, while distinct names remain separate (e.g., “front_bumper” vs. “rear_bumper”).
- An alias table records mappings from each original label (e.g., PartNet’s “bed_footboard”) to the canonical label (“footboard”).
This process results in a unified, flat vocabulary comprising 1,794 unique part names. Each object category draws from its relevant subset (such as “engine” and “wing” for airplanes), ensuring that only semantically appropriate part names are considered during annotation and inference. This global vocabulary enhances label consistency across categories and datasets (Paul et al., 19 Dec 2025).
3. Data Collection and Annotation Workflow
Tex-Parts is sourced from TexVerse, a public corpus of over 850,000 high-quality textured meshes. The annotation pipeline comprises three main stages:
- Pre-filtering: Each mesh undergoes automated assessment using shape thumbnails, metadata, and polycount; a Gemini model filters out low-quality or broken assets.
- Automated Proposal: Remaining meshes are sampled to 10,000 surface points and processed by ALIGN-Parts, which predicts up to K=32 partlets. For each partlet, the model outputs a soft point mask, a text-embedding prototype aligned to an affordance description, and a dual-source confidence score (combining softmax alignment and Mahalanobis distance).
- Human Verification: Shapes are ranked by partlet confidence scores and distributed to annotators in two phases:
- Phase I: Annotators accept or edit proposed partlet masks and names, aided by an autocomplete prompt to facilitate naming.
- Phase II: Annotators supplement any missing parts, if necessary.
By prioritizing high-confidence predictions, the pipeline optimizes annotator efficiency while maintaining segmentation and naming accuracy (Paul et al., 19 Dec 2025).
4. Dataset Composition and Statistics
The current release of Tex-Parts features:
- Number of annotated shapes: 8,450
- Canonical part vocabulary: 1,794 names
- Total unique part categories instantiated: ≈14,000 (reflecting object-conditional subsets and additional Phase II insertions)
- Average parts per shape: ≈12 (most in the 3–20 range; few above 28)
- Planned train/val/test split: 80/10/10 by shape, with no overlap in object subcategories across folds
- Annotation speedup: 5–8× relative to conventional manual mesh segmentation tools
This scale and consistency facilitate transfer learning and robust benchmarking for semantic 3D part segmentation tasks (Paul et al., 19 Dec 2025).
5. File Formats, Metadata, and Licensing
Each Tex-Parts example is distributed with the following components:
- mesh.obj, mesh.mtl: The original TexVerse triangle mesh and material definitions.
- points.ply: 10,000 sampled surface points, each recording XYZ coordinates, normals, and RGB color.
- annotation.json: Contains all segmentation and semantic annotation metadata, including:
- parts: Array of objects, each with:
part_id: Integer identifiername: Canonical part labelaffordance_description: Text description of part functionmask: Indices of member pointsconf_soft: Softmax-based confidenceconf_maha: Mahalanobis-based confidence- global: Object class and confidence
- aliases: Mapping from original tags or groups to canonical part IDs
All associated code is released under the Apache 2.0 license; the dataset itself adopts the TexVerse CC BY 4.0 license (Paul et al., 19 Dec 2025).
6. Evaluation Metrics
Tex-Parts introduces specialized metrics for evaluating named 3D part segmentation, capturing both geometric and semantic correctness. Let represent the set of ground-truth part masks and the set of predictions. With and , the following metrics are defined:
- Class-agnostic mIoU: (focused solely on geometric overlap).
- Label-Aware mIoU (LA-mIoU): , where denotes the part label index; any label mismatch incurs zero reward.
- Relaxed Label-Aware mIoU (rLA-mIoU): , where is the MPNet embedding of the part's affordance description; partial credit is awarded for semantically similar predictions.
Empirical results on a held-out test set of 206 shapes yield Pearson's , Spearman's , and , . The relationships hold by metric design (Paul et al., 19 Dec 2025).
7. Significance and Benchmarking
Tex-Parts establishes a new standard for open-vocabulary, semantically grounded 3D part segmentation. Its unified ontology and annotation methodology support models capable of producing complete, disjoint decompositions into human-readable, affordance-aware part labels. The dataset is anticipated to facilitate community-wide advances in training and evaluating models requiring consistent, scalable, and semantically meaningful 3D segmentation (Paul et al., 19 Dec 2025).