ModelNet40-C: 3D Point Cloud Robustness Benchmark

Updated 7 February 2026

ModelNet40-C is the first systematic benchmark for evaluating 3D point cloud classification robustness, featuring 922,440 corrupted samples across 15 corruption types and 5 severity levels.
It categorizes perturbations into density, noise, and transformation families, enabling detailed error analysis and direct comparison of state-of-the-art architectures.
Experiments show that combined strategies, such as PointCutMix-R training and TENT-based test-time adaptation, significantly reduce classification errors under severe corruptions.

ModelNet40-C defines the first systematic benchmark for evaluating the robustness of 3D point cloud classification under a broad spectrum of common corruptions. Modeled after the seminal approach of corruption benchmarks in 2D domains, ModelNet40-C introduces both realistic and artificial perturbations to the canonical ModelNet40 dataset, enabling detailed measurement of the degradation in classification performance across state-of-the-art (SOTA) neural architectures and robustness algorithms (Sun et al., 2022).

1. Dataset Construction and Corruption Taxonomy

ModelNet40-C is derived from the ModelNet40 base set (12,308 samples across 40 object categories, each a resampled 1,024-point CAD-derived cloud). The primary contribution lies in constructing a test suite comprising 922,440 corrupted point clouds, resulting from the Cartesian product of 15 corruption types and 5 severity levels applied to every validation sample.

Corruptions act directly in the 3D domain, grouped into three principal families as outlined below:

Family	Types	Mechanism
Density	Occlusion, LiDAR, Local_Density_Inc, Local_Density_Dec, Cutout	Point removal/duplication
Noise	Uniform, Gaussian, Impulse, Upsampling, Background	Coordinate perturbation/noise
Transformation	Rotation, Shear, FFD, RBF, Inverse-RBF	Global/geometric deformation

Each corruption is formally parameterized by a severity index $s \in \{1, 2, 3, 4, 5\}$ , which modulates magnitude (e.g., noise scale $\epsilon_s$ , point addition/removal $K_s$ , deformation scale $d_s$ ). The dataset ensures class semantics are preserved even at maximum severity level.

2. Evaluation Metrics and Reporting Protocol

Performance is measured using several error rate (ER) and corruption error (CE) statistics, analogous to the protocol established for 2D image corruptions. For a trained classifier $f$ :

$ER_{clean}$ : error rate on clean ModelNet40
$ER_{c,s}$ : error rate on corruption $c$ at severity $s$
$ER_c = \frac{1}{5} \sum_{s=1}^5 ER_{c,s}$
$\epsilon_s$ 0

The corruption error $\epsilon_s$ 1 and mean corruption error $\epsilon_s$ 2 aggregate error rates across severities and corruption types, ensuring direct cross-study comparison. Per-severity trends and a class-wise mean error rate (mER) further elucidate monotonicity and differential semantic impacts.

3. Baseline Architectures and Robustness Findings

Experiments evaluated six contemporary point cloud architectures, all trained with identical recipes (input size $\epsilon_s$ 3, random scaling/translation, Adam optimizer, label-smoothed cross-entropy). Clean performance ( $\epsilon_s$ 4) and average corrupted error ( $\epsilon_s$ 5) are established for:

Model	ER_clean (%)	Key Observations
PointNet	9.3	Robust to density shifts; fragile under transformation
PointNet++	7.0	Superior on Background
DGCNN	7.4	-
RSCNN	7.7	Filters background noise
PCT	7.1	Most stable to global transformations
SimpleView	6.1	-

On average, $\epsilon_s$ 6 vs. $\epsilon_s$ 7, demonstrating a tripling of error under corruption. The largest degradations occur for Occlusion ( $\epsilon_s$ 8), LiDAR ( $\epsilon_s$ 9), and Impulse/Background ( $K_s$ 0). There is no universal architecture leader; robustness is type-dependent.

4. Robustness Enhancement Strategies

Both data augmentation (training-time) and test-time adaptation techniques are benchmarked:

4.1 Mixing-based Data Augmentations

PointCutMix-R: Replaces a random connected subset of one cloud with another; $K_s$ 1 reduced to $K_s$ 2.
PointMixup: Linear interpolation of pairs; excels on transformation corruptions ( $K_s$ 3).
RSMix: Local neighborhood-level mixing; most effective on density corruptions.
PGD-AT: Adversarial coordinate perturbations; modest gains.

No method dominates universally, but CutMix/Mixup techniques can halve $K_s$ 4 versus standard training.

4.2 Test-Time Adaptation

BN-statistics recomputation: Batch-wise normalization statistics updated at test time ( $K_s$ 5, –5.6pp).
TENT: Optimizes BatchNorm affine parameters to minimize entropy ( $K_s$ 6, –6.2pp), showing strongest effect on hardest corruptions (Occlusion, LiDAR, Rotation).

4.3 Combined Schemes

PointCutMix-R for training and TENT at inference constitute the most effective pipeline ( $K_s$ 7), particularly when combined with a Transformer-based backbone (PCT).

5. Critical Insights and Interpretations

Corruption “hot spots” (Occlusion, LiDAR) generate the largest performance drops; even mild spatial transformations (≤15°) are highly disruptive.
Architectural design is central: global pooling confers density insensitivity, but at a cost to geometric robustness; transformer blocks confer robustness to global deformations.
Matching augmentation to corruption type is essential (e.g., CutMix-R for noise, Mixup for transformation, RSMix for density).
Test-time adaptation is nearly as valuable as the best data augmentation and their combination is synergistic.

A plausible implication is that hybrid backbones blending local and global feature modeling may offer superior overall robustness.

6. Open Directions and Recommendations

Identified research opportunities include the development of unified architectures integrating both local and global invariances (e.g., convolution+attention hybrids), automated learning of point cloud corruptions (e.g., generative worst-case synthesis), and extension of ModelNet40-C to segmentation and detection. Additionally, theoretical work elucidating the alignment between augmentation and corruption class in the 3D domain remains largely open (Sun et al., 2022).

All code, dataset, and per-type/per-severity results are provided at https://github.com/jiachens/ModelNet40-C.

Markdown Report Issue Upgrade to Chat

References (1)

Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ModelNet40-C.