Papers
Topics
Authors
Recent
Search
2000 character limit reached

PartNet: A 3D Part Segmentation Benchmark

Updated 1 April 2026
  • PartNet is a large-scale benchmark of 3D objects with fine-grained, hierarchical part annotations across 24 object categories.
  • It provides detailed semantic, topological, and instance-level labels that support both supervised segmentation and zero-shot learning tasks.
  • The dataset enables advanced 3D vision research by standardizing evaluation in fine-grained semantic, hierarchical, and instance segmentation.

PartNet is a large-scale, fine-grained, and hierarchically annotated benchmark of 3D objects designed to advance 3D part-based object understanding, segmentation, and compositional generalization. Its annotations—encompassing semantic, topological, and instance-level part information across 24 major man-made object categories—provide a standard for both supervised part segmentation and compositional zero-shot learning in the 3D vision community (Mo et al., 2018, Naeem et al., 2021).

1. Dataset Scope and Structure

PartNet contains 26,671 unique 3D models sampled from ShapeNet, with comprehensive part-level annotations spanning 24 object classes such as Chair, Table, Bottle, Lamp, Faucet, and others. The dataset comprises 573,585 part instances, yielding a median of 14 parts per shape (maximum 230) and median hierarchy depth of 3 (maximum 7) (Mo et al., 2018). Models are provided in mesh or point-cloud form, with version-specific differences in the point sampling protocol.

Object categories include a comprehensive array from both common household items and specialized indoor scene elements. Each category's models are split as follows: 70% for training, 10% for validation, and 20% for test, with shape counts ranging from 88 (Scissors) to over 8,000 (Table).

2. Hierarchical and Semantically Consistent Annotations

Each object instance in PartNet is annotated following a deep, hierarchical part taxonomy. An expert-designed And-Or-Graph template defines hierarchical decompositions, using "And" nodes for decomposition into subparts and "Or" nodes for alternative subtypes. Leaf nodes represent atomic, non-decomposable parts, with shared labels across object classes whenever semantic identity permits (e.g., "leg" occurs in both chairs and tables). Templates are well-defined, consistent, hierarchical, and compact, supporting depths up to 7 and as many as 128 fine-grained leaf-level part labels (Mo et al., 2018, Naeem et al., 2021).

Annotation was performed using a web-based 3D Q&A interface and mesh-cut tools. All part segmentation was executed by professional annotators (66 in total), each shape taking an average of 8 minutes to annotate. After multiple rounds of inter-annotator agreement analysis and template refinement, the average agreement on leaf-nodes reached 83.3% (σ=10.4%) (Mo et al., 2018).

3. Benchmarks, Evaluation Protocols, and Metrics

PartNet was released as a benchmark supporting three main tasks:

  • Fine-grained Semantic Segmentation: Assigning each point a part label at coarse, middle, or fine granularity.
  • Hierarchical Semantic Segmentation: Predicting a complete root-to-leaf part hierarchy path for each point.
  • Part Instance Segmentation: Detecting and isolating each individual part instance as a disjoint region.

Evaluation metrics include intersection-over-union (IoU) between predicted and ground-truth part regions,

IoU(P,G)=PGPG,\mathrm{IoU}(P, G) = \frac{|P \cap G|}{|P \cup G|},

and mean IoU, which is averaged per-part or per-shape across categories. Mean Average Precision (mAP) at IoU≥0.5 is used in the instance segmentation task. Multi-scale protocols require performance to be reported separately at different segmentation granularities (coarse, middle, fine) (Mo et al., 2018).

Representative performance for fine-grained semantic segmentation (mean part-category IoU, fine level) is as follows:

Method Coarse Middle Fine Overall
PointNet 57.9 37.3 35.6 51.2
PointNet++ 65.5 44.5 42.5 58.1
SpiderCNN 60.4 41.7 37.0 53.6
PointCNN 64.3 46.5 46.4 59.8

Baseline approaches for hierarchical segmentation and instance segmentation are also provided, including top-down, bottom-up, and ensemble strategies (Mo et al., 2018).

4. Compositional PartNet (C-PartNet) and Zero-Shot Learning

The Compositional-PartNet (C-PartNet) benchmark extends the original PartNet for compositional zero-shot learning, motivated by the need for part-level generalization to unseen object categories (Naeem et al., 2021). C-PartNet is constructed as follows:

  1. Unification of Part Labels: Pairwise Intersection-over-Union is used to merge original leaf-level labels across categories whenever parts are semantically and geometrically consistent. The merging criterion is

IOU(pi,pj)=PiPjPiPj>τ,\mathrm{IOU}(p_i, p_j) = \frac{|P_i \cap P_j|}{|P_i \cup P_j|} > \tau,

where PiP_i and PjP_j are normalized sets of points carrying labels pip_i and pjp_j, and candidate pairs are confirmed by human inspection.

  1. Final Label Set: 128 original labels are reduced to 96 unified part classes, with additional manual grouping for synonyms not captured by geometric overlap.
  2. Per-object Part Priors: For each object category oo, the set of allowed part labels is

Po={punified part label p occurs at least once in o},\mathcal{P}_o = \{p \mid \text{unified part label}\ p\ \text{occurs at least once in}\ o\},

used to constrain segmentation predictions.

  1. Zero-shot Splits: 24 object categories are partitioned into 16 seen and 8 unseen classes. Unseen classes are stratified into three difficulty tiers (easy: Bowl, Mug, TrashCan; medium: Dishwasher, Refrigerator, Laptop; hard: Door, Scissors).
  2. Split Statistics:
Split #Obj Classes #Samples Note
Train 16 (seen) 16,875 only seen object classes
Val 16 seen + 2 unseen 2,619 unseen: Bowl, Dishwasher
Test 16 seen + 8 unseen 5,169 full zero-shot evaluation
Total 24 25,900 all classes, all samples

Average parts per shape remain ≈19, and per-object part-prior cardinalities range—for example, Chair: 12, Table: 10, Bottle: 4, Mug: 5, Keyboard: 8 (Naeem et al., 2021).

The pairwise part overlap ratios (e.g., Bottle vs. Vase: 0.75) quantify compositional part-sharing, critical for zero-shot reasoning.

5. Data Format, Access, and Utility

Shapes are represented as either processed mesh models or uniformly sampled point clouds (typically 2,048 or 10,000 points) with (x,y,z)(x, y, z) coordinates and, if available, normals. Annotation files include per-point part labels and, for hierarchical segmentation, tree-structured JSON files specifying parent-child relations and node types. The dataset organization includes category-level and per-split subdirectories. Official splits and annotations, as well as code and benchmark scripts, are accessible at https://cs.stanford.edu/~kaichun/partnet/ (Mo et al., 2018).

Applications include fine-grained shape analysis, articulated scene modeling, affordance and functional reasoning, and robotics (e.g., grasping and manipulation). Recommendations for practitioners include sampling at least 10,000 points for shapes with small components and per-category/per-granularity training to mitigate class imbalance.

6. Impact and Research Significance

PartNet constitutes the leading benchmark for 3D part-level understanding, underpinning research in semantic and instance segmentation, hierarchical decomposition, shape analysis, and zero-shot learning. Its compositional design, taxonomic rigor, and large scale have standardized performance evaluation and fostered development of novel 3D deep learning methods. The introduction of C-PartNet directly addresses the challenge of compositionality—demonstrating that conventional methods fail to generalize to unseen object categories under unified part ontologies, and providing a diagnostic testbed for part-level transfer in 3D vision (Naeem et al., 2021).

A plausible implication is that future progress in compositional zero-shot learning for 3D segmentation will require methods that reason explicitly over unified part representations and per-object priors, as instantiated in the C-PartNet protocol.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PartNet Dataset.