Center-guided Classifier for Semantic Segmentation of Remote Sensing Images

Published 21 Mar 2025 in cs.CV | (2503.16963v1)

Abstract: Compared with natural images, remote sensing images (RSIs) have the unique characteristic. i.e., larger intraclass variance, which makes semantic segmentation for remote sensing images more challenging. Moreover, existing semantic segmentation models for remote sensing images usually employ a vanilla softmax classifier, which has three drawbacks: (1) non-direct supervision for the pixel representations during training; (2) inadequate modeling ability of parametric softmax classifiers under large intraclass variance; and (3) opaque process of classification decision. In this paper, we propose a novel classifier (called CenterSeg) customized for RSI semantic segmentation, which solves the abovementioned problems with multiple prototypes, direct supervision under Grassmann manifold, and interpretability strategy. Specifically, for each class, our CenterSeg obtains local class centers by aggregating corresponding pixel features based on ground-truth masks, and generates multiple prototypes through hard attention assignment and momentum updating. In addition, we introduce the Grassmann manifold and constrain the joint embedding space of pixel features and prototypes based on two additional regularization terms. Especially, during the inference, CenterSeg can further provide interpretability to the model by restricting the prototype as a sample of the training set. Experimental results on three remote sensing segmentation datasets validate the effectiveness of the model. Besides the superior performance, CenterSeg has the advantages of simplicity, lightweight, compatibility, and interpretability. Code is available at https://github.com/xwmaxwma/rssegmentation.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

Center-Guided Classifier for Semantic Segmentation of Remote Sensing Images

The paper titled "Center-guided Classifier for Semantic Segmentation of Remote Sensing Images" presents an innovative approach designed to address the challenges posed by remote sensing images (RSIs) during semantic segmentation. RSIs are known for significant intraclass variance due to complex spatial distributions, which complicates the task of accurately segmenting these images into meaningful categories. The authors propose a novel classifier, referred to as CenterSeg, which differs from the conventional softmax classifiers by leveraging multiple prototypes, direct supervision under a Grassmann manifold, and an interpretability strategy.

The traditional parametric softmax classifiers used in RSI segmentation are criticized for three primary limitations: inadequate supervision of pixel representations during training, limited capability to handle the large intraclass variance, and a lack of transparency in classification decisions. CenterSeg addresses these by introducing local class centers and generating multiple prototypes specific to each class based on ground-truth masks. This process involves aggregating pixel features to form local class centers and updating class prototypes using hard attention assignment and a momentum update strategy. Furthermore, the incorporation of Grassmann manifold constraints adds regularization to the joint space of pixel features and prototypes, refining the overall feature embedding space.

The experimental validation of CenterSeg was conducted on three RSI segmentation datasets, demonstrating its effectiveness through improved performance metrics—evident in mIoU, F1 score, and OA—compared to existing baseline models. Notably, the method exhibits strong results across diverse classes, especially in scenarios involving small and complex objects where traditional methods falter.

From a practical perspective, CenterSeg offers several advantages. It simplifies the implementation by not demanding extensive storage or computational resources. The lightweight nature means it introduces negligible additional parameters during inference, ensuring compatibility and ease of integration with existing methods. The interpretability provided by this classifier is another significant benefit, offering insights into model decisions while promoting transparency. This is particularly useful in applications like urban planning and environmental monitoring, where understanding model predictions is crucial.

The theoretical implications are profound as the approach sets a precedent in segmentation tasks where traditional classifiers struggle with larger intraclass variance and the unique challenges posed by non-natural image datasets. CenterSeg's methodology could be adapted to similar tasks beyond remote sensing, enhancing semantic segmentation across a spectrum of applications. In terms of future developments, CenterSeg may inspire further research into classifier designs that could yield robust, interpretable, and efficient models suited for an array of complex imaging datasets. Overall, this paper contributes significantly to the domain of semantic segmentation, especially for remote sensing images, offering a promising alternative to traditional segmentation strategies.