Papers
Topics
Authors
Recent
Search
2000 character limit reached

ModelNet40-C: 3D Point Cloud Robustness Benchmark

Updated 7 February 2026
  • ModelNet40-C is the first systematic benchmark for evaluating 3D point cloud classification robustness, featuring 922,440 corrupted samples across 15 corruption types and 5 severity levels.
  • It categorizes perturbations into density, noise, and transformation families, enabling detailed error analysis and direct comparison of state-of-the-art architectures.
  • Experiments show that combined strategies, such as PointCutMix-R training and TENT-based test-time adaptation, significantly reduce classification errors under severe corruptions.

ModelNet40-C defines the first systematic benchmark for evaluating the robustness of 3D point cloud classification under a broad spectrum of common corruptions. Modeled after the seminal approach of corruption benchmarks in 2D domains, ModelNet40-C introduces both realistic and artificial perturbations to the canonical ModelNet40 dataset, enabling detailed measurement of the degradation in classification performance across state-of-the-art (SOTA) neural architectures and robustness algorithms (Sun et al., 2022).

1. Dataset Construction and Corruption Taxonomy

ModelNet40-C is derived from the ModelNet40 base set (12,308 samples across 40 object categories, each a resampled 1,024-point CAD-derived cloud). The primary contribution lies in constructing a test suite comprising 922,440 corrupted point clouds, resulting from the Cartesian product of 15 corruption types and 5 severity levels applied to every validation sample.

Corruptions act directly in the 3D domain, grouped into three principal families as outlined below:

Family Types Mechanism
Density Occlusion, LiDAR, Local_Density_Inc, Local_Density_Dec, Cutout Point removal/duplication
Noise Uniform, Gaussian, Impulse, Upsampling, Background Coordinate perturbation/noise
Transformation Rotation, Shear, FFD, RBF, Inverse-RBF Global/geometric deformation

Each corruption is formally parameterized by a severity index s{1,2,3,4,5}s \in \{1, 2, 3, 4, 5\}, which modulates magnitude (e.g., noise scale ϵs\epsilon_s, point addition/removal KsK_s, deformation scale dsd_s). The dataset ensures class semantics are preserved even at maximum severity level.

2. Evaluation Metrics and Reporting Protocol

Performance is measured using several error rate (ER) and corruption error (CE) statistics, analogous to the protocol established for 2D image corruptions. For a trained classifier ff:

  • ERcleanER_{clean}: error rate on clean ModelNet40
  • ERc,sER_{c,s}: error rate on corruption cc at severity ss
  • ERc=15s=15ERc,sER_c = \frac{1}{5} \sum_{s=1}^5 ER_{c,s}
  • ERcor=115c=115ERcER_{cor} = \frac{1}{15} \sum_{c=1}^{15} ER_c

The corruption error CEcCE_c and mean corruption error mCEmCE aggregate error rates across severities and corruption types, ensuring direct cross-study comparison. Per-severity trends and a class-wise mean error rate (mER) further elucidate monotonicity and differential semantic impacts.

3. Baseline Architectures and Robustness Findings

Experiments evaluated six contemporary point cloud architectures, all trained with identical recipes (input size =1,024= 1,024, random scaling/translation, Adam optimizer, label-smoothed cross-entropy). Clean performance (ERcleanER_{clean}) and average corrupted error (ERcorER_{cor}) are established for:

Model ER_clean (%) Key Observations
PointNet 9.3 Robust to density shifts; fragile under transformation
PointNet++ 7.0 Superior on Background
DGCNN 7.4 -
RSCNN 7.7 Filters background noise
PCT 7.1 Most stable to global transformations
SimpleView 6.1 -

On average, ERcor26.1%ER_{cor} \approx 26.1\% vs. ERclean7.5%ER_{clean} \approx 7.5\%, demonstrating a tripling of error under corruption. The largest degradations occur for Occlusion (55%\approx 55\%), LiDAR (72%\approx 72\%), and Impulse/Background (33%/48%\approx 33\%/48\%). There is no universal architecture leader; robustness is type-dependent.

4. Robustness Enhancement Strategies

Both data augmentation (training-time) and test-time adaptation techniques are benchmarked:

4.1 Mixing-based Data Augmentations

  • PointCutMix-R: Replaces a random connected subset of one cloud with another; ERcorER_{cor} reduced to 18.7%18.7\%.
  • PointMixup: Linear interpolation of pairs; excels on transformation corruptions (12.7%12.7\%).
  • RSMix: Local neighborhood-level mixing; most effective on density corruptions.
  • PGD-AT: Adversarial coordinate perturbations; modest gains.

No method dominates universally, but CutMix/Mixup techniques can halve ERcorER_{cor} versus standard training.

4.2 Test-Time Adaptation

  • BN-statistics recomputation: Batch-wise normalization statistics updated at test time (20.5%20.5\%, –5.6pp).
  • TENT: Optimizes BatchNorm affine parameters to minimize entropy (19.9%19.9\%, –6.2pp), showing strongest effect on hardest corruptions (Occlusion, LiDAR, Rotation).

4.3 Combined Schemes

PointCutMix-R for training and TENT at inference constitute the most effective pipeline (ERcor15.4%ER_{cor} \approx 15.4\%), particularly when combined with a Transformer-based backbone (PCT).

5. Critical Insights and Interpretations

  • Corruption “hot spots” (Occlusion, LiDAR) generate the largest performance drops; even mild spatial transformations (≤15°) are highly disruptive.
  • Architectural design is central: global pooling confers density insensitivity, but at a cost to geometric robustness; transformer blocks confer robustness to global deformations.
  • Matching augmentation to corruption type is essential (e.g., CutMix-R for noise, Mixup for transformation, RSMix for density).
  • Test-time adaptation is nearly as valuable as the best data augmentation and their combination is synergistic.

A plausible implication is that hybrid backbones blending local and global feature modeling may offer superior overall robustness.

6. Open Directions and Recommendations

Identified research opportunities include the development of unified architectures integrating both local and global invariances (e.g., convolution+attention hybrids), automated learning of point cloud corruptions (e.g., generative worst-case synthesis), and extension of ModelNet40-C to segmentation and detection. Additionally, theoretical work elucidating the alignment between augmentation and corruption class in the 3D domain remains largely open (Sun et al., 2022).

All code, dataset, and per-type/per-severity results are provided at https://github.com/jiachens/ModelNet40-C.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ModelNet40-C.