Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Imbalanced Semantic Segmentation Through Neural Collapse (2301.01100v1)

Published 3 Jan 2023 in cs.CV and cs.LG

Abstract: A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.

Citations (37)

Summary

  • The paper introduces the Center Collapse Regularizer (CeCo) to leverage neural collapse theory for mitigating class imbalance in semantic segmentation.
  • It employs a dual-branch framework that aligns within-class feature centers to a simplex ETF structure, enhancing discrimination of underrepresented classes.
  • Empirical results on datasets like ScanNet200 and ADE20K show significant performance improvements and compatibility with various segmentation architectures.

This paper, "Understanding Imbalanced Semantic Segmentation Through Neural Collapse" (2301.01100), explores the phenomenon of neural collapse in the context of semantic segmentation, particularly focusing on the challenges posed by imbalanced class distributions inherent in such tasks. Neural collapse, previously observed in image classification, describes the convergence of last-layer features and classifier weights to a highly symmetric structure known as a simplex equiangular tight frame (ETF) during training on balanced datasets. This structure provides maximal angular separation between class representatives, which is beneficial for discrimination.

The authors observe that this elegant neural collapse structure does not fully emerge in semantic segmentation models trained on typical datasets like ScanNet200, ADE20K, and COCO-Stuff164K. They identify two key reasons: the contextual correlation between classes in dense prediction tasks (neighboring pixels/points are often semantically related) and the significant class imbalance where some categories occupy vastly more area or points than others. The failure to achieve the symmetric ETF structure, particularly for feature centers, is shown to negatively impact the performance of minor classes, whose features and classifier vectors may end up too close to major classes.

To address this, the paper proposes a practical method called Center Collapse Regularizer (CeCo). The core idea is to encourage the within-class feature centers to converge towards a simplex ETF structure during training, thereby preserving the desirable equiangular and maximally separated properties that benefit discriminative learning, especially for underrepresented classes.

Implementation Details and Architecture:

CeCo is designed as an auxiliary training mechanism that can be easily integrated into existing semantic segmentation architectures. The overall framework consists of two branches:

  1. Point/Pixel Recognition Branch: This is the standard segmentation model pipeline (e.g., a FCN, UperNet, DeepLabV3+, or MinkowskiNet) which takes the input image or point cloud and outputs per-pixel/point features. This branch is trained using a standard semantic segmentation loss, typically Cross-Entropy (LPR\mathcal{L}_{\rm PR}).
  2. Center Regularization Branch: This new branch operates on the output features from the main backbone.
    • For each training sample, the features ziz_i for all pixels/points are extracted.
    • Based on the ground truth labels yiy_i, the within-class mean feature vectors (feature centers) zˉk\bar{z}_k for each class kk are computed by averaging the features ziz_i belonging to that class.
    • These computed feature centers zˉk\bar{z}_k are then fed into a classifier layer in the Center Regularization Branch. Crucially, this classifier is fixed to have a simplex ETF structure, initialized based on the class dimension KK. This fixed structure acts as a target geometry for the feature centers.
    • A separate Cross-Entropy loss (LCR\mathcal{L}_{\rm CR}) is computed between the predictions of this fixed ETF classifier on the feature centers zˉk\bar{z}_k and their corresponding class labels kk.

The total training loss is a weighted sum of the two branch losses: Ltotal=LPR+λLCR\mathcal{L}_{\rm total} = \mathcal{L}_{\rm PR} + \lambda \mathcal{L}_{\rm CR}, where λ\lambda is a hyperparameter balancing the two terms.

During inference, the entire Center Regularization Branch is discarded. Only the standard Point/Pixel Recognition Branch with its learned classifier is used for prediction. This ensures that CeCo adds no computational overhead during deployment.

Practical Benefits and Empirical Evidence:

The paper provides empirical evidence for the effectiveness and mechanics of CeCo:

  • Reduced Imbalance for Centers: By focusing on class centers rather than individual pixels/points, the effective imbalance factor is significantly reduced (e.g., from 37256 to 597 on ScanNet200), making the problem more tractable for standard optimization.
  • Improved Feature Geometry: Experiments show that models trained with CeCo exhibit feature centers that are significantly closer to the desired equiangular and maximally separated ETF structure compared to baseline models. This is visualized by the reduced standard deviation and shifted average of cosines between feature centers.
  • Enhanced Performance for Minor Classes: CeCo consistently improves performance, particularly on the "Common" and "Tail" categories in imbalanced datasets like ScanNet200 and ADE20K. This aligns with the theoretical motivation that better-separated feature centers improve discrimination for less frequent classes.
  • Orthogonality with Other Losses: CeCo acts as a feature-level regularization and is shown to be orthogonal to and compatible with commonly used segmentation losses like Dice and Lovász losses, leading to further performance gains when combined.
  • Flexibility: CeCo is demonstrated to work effectively across various backbone architectures (CNNs like ResNet, HRNet, and Transformers like Swin, BEiT) and segmentation heads (UperNet, OCRNet, DeepLabV3+), on both 2D image and 3D point cloud semantic segmentation tasks.
  • State-of-the-Art Results: The method achieves state-of-the-art results on benchmarks like ScanNet200, significantly improving the mean IoU, especially on the tail classes.

Implementation Considerations:

  • Training Overhead: The addition of the Center Regularization Branch and the computation of feature centers within each training batch increases the training time. The paper reports an increase of about 10-20% in training time per batch compared to baselines.
  • Hyperparameter Tuning: The weight λ\lambda for the center collapse loss needs to be tuned, although experiments suggest that performance is consistently improved over a relatively wide range of λ\lambda values.
  • Applicability: While effective for significantly imbalanced datasets with many classes, the paper notes that the benefits might be less pronounced for datasets with fewer classes or lower imbalance ratios (e.g., Cityscapes, ScanNet v2 20 classes).

The authors have made their code available at https://github.com/dvlab-research/Imbalanced-Learning, allowing practitioners to integrate this method into their own semantic segmentation pipelines. This work translates theoretical insights from neural collapse into a practical, effective regularization technique for the challenging problem of imbalanced semantic segmentation.