ModelNet40-C: 3D Point Cloud Robustness Benchmark
- ModelNet40-C is the first systematic benchmark for evaluating 3D point cloud classification robustness, featuring 922,440 corrupted samples across 15 corruption types and 5 severity levels.
- It categorizes perturbations into density, noise, and transformation families, enabling detailed error analysis and direct comparison of state-of-the-art architectures.
- Experiments show that combined strategies, such as PointCutMix-R training and TENT-based test-time adaptation, significantly reduce classification errors under severe corruptions.
ModelNet40-C defines the first systematic benchmark for evaluating the robustness of 3D point cloud classification under a broad spectrum of common corruptions. Modeled after the seminal approach of corruption benchmarks in 2D domains, ModelNet40-C introduces both realistic and artificial perturbations to the canonical ModelNet40 dataset, enabling detailed measurement of the degradation in classification performance across state-of-the-art (SOTA) neural architectures and robustness algorithms (Sun et al., 2022).
1. Dataset Construction and Corruption Taxonomy
ModelNet40-C is derived from the ModelNet40 base set (12,308 samples across 40 object categories, each a resampled 1,024-point CAD-derived cloud). The primary contribution lies in constructing a test suite comprising 922,440 corrupted point clouds, resulting from the Cartesian product of 15 corruption types and 5 severity levels applied to every validation sample.
Corruptions act directly in the 3D domain, grouped into three principal families as outlined below:
| Family | Types | Mechanism |
|---|---|---|
| Density | Occlusion, LiDAR, Local_Density_Inc, Local_Density_Dec, Cutout | Point removal/duplication |
| Noise | Uniform, Gaussian, Impulse, Upsampling, Background | Coordinate perturbation/noise |
| Transformation | Rotation, Shear, FFD, RBF, Inverse-RBF | Global/geometric deformation |
Each corruption is formally parameterized by a severity index , which modulates magnitude (e.g., noise scale , point addition/removal , deformation scale ). The dataset ensures class semantics are preserved even at maximum severity level.
2. Evaluation Metrics and Reporting Protocol
Performance is measured using several error rate (ER) and corruption error (CE) statistics, analogous to the protocol established for 2D image corruptions. For a trained classifier :
- : error rate on clean ModelNet40
- : error rate on corruption at severity
The corruption error and mean corruption error aggregate error rates across severities and corruption types, ensuring direct cross-study comparison. Per-severity trends and a class-wise mean error rate (mER) further elucidate monotonicity and differential semantic impacts.
3. Baseline Architectures and Robustness Findings
Experiments evaluated six contemporary point cloud architectures, all trained with identical recipes (input size , random scaling/translation, Adam optimizer, label-smoothed cross-entropy). Clean performance () and average corrupted error () are established for:
| Model | ER_clean (%) | Key Observations |
|---|---|---|
| PointNet | 9.3 | Robust to density shifts; fragile under transformation |
| PointNet++ | 7.0 | Superior on Background |
| DGCNN | 7.4 | - |
| RSCNN | 7.7 | Filters background noise |
| PCT | 7.1 | Most stable to global transformations |
| SimpleView | 6.1 | - |
On average, vs. , demonstrating a tripling of error under corruption. The largest degradations occur for Occlusion (), LiDAR (), and Impulse/Background (). There is no universal architecture leader; robustness is type-dependent.
4. Robustness Enhancement Strategies
Both data augmentation (training-time) and test-time adaptation techniques are benchmarked:
4.1 Mixing-based Data Augmentations
- PointCutMix-R: Replaces a random connected subset of one cloud with another; reduced to .
- PointMixup: Linear interpolation of pairs; excels on transformation corruptions ().
- RSMix: Local neighborhood-level mixing; most effective on density corruptions.
- PGD-AT: Adversarial coordinate perturbations; modest gains.
No method dominates universally, but CutMix/Mixup techniques can halve versus standard training.
4.2 Test-Time Adaptation
- BN-statistics recomputation: Batch-wise normalization statistics updated at test time (, –5.6pp).
- TENT: Optimizes BatchNorm affine parameters to minimize entropy (, –6.2pp), showing strongest effect on hardest corruptions (Occlusion, LiDAR, Rotation).
4.3 Combined Schemes
PointCutMix-R for training and TENT at inference constitute the most effective pipeline (), particularly when combined with a Transformer-based backbone (PCT).
5. Critical Insights and Interpretations
- Corruption “hot spots” (Occlusion, LiDAR) generate the largest performance drops; even mild spatial transformations (≤15°) are highly disruptive.
- Architectural design is central: global pooling confers density insensitivity, but at a cost to geometric robustness; transformer blocks confer robustness to global deformations.
- Matching augmentation to corruption type is essential (e.g., CutMix-R for noise, Mixup for transformation, RSMix for density).
- Test-time adaptation is nearly as valuable as the best data augmentation and their combination is synergistic.
A plausible implication is that hybrid backbones blending local and global feature modeling may offer superior overall robustness.
6. Open Directions and Recommendations
Identified research opportunities include the development of unified architectures integrating both local and global invariances (e.g., convolution+attention hybrids), automated learning of point cloud corruptions (e.g., generative worst-case synthesis), and extension of ModelNet40-C to segmentation and detection. Additionally, theoretical work elucidating the alignment between augmentation and corruption class in the 3D domain remains largely open (Sun et al., 2022).
All code, dataset, and per-type/per-severity results are provided at https://github.com/jiachens/ModelNet40-C.