AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation (2403.01818v3)

Published 4 Mar 2024 in cs.CV and cs.AI

Abstract: Semi-supervised semantic segmentation (SSSS) has been proposed to alleviate the burden of time-consuming pixel-level manual labeling, which leverages limited labeled data along with larger amounts of unlabeled data. Current state-of-the-art methods train the labeled data with ground truths and unlabeled data with pseudo labels. However, the two training flows are separate, which allows labeled data to dominate the training process, resulting in low-quality pseudo labels and, consequently, sub-optimal results. To alleviate this issue, we present AllSpark, which reborns the labeled features from unlabeled ones with the channel-wise cross-attention mechanism. We further introduce a Semantic Memory along with a Channel Semantic Grouping strategy to ensure that unlabeled features adequately represent labeled features. The AllSpark shed new light on the architecture level designs of SSSS rather than framework level, which avoids increasingly complicated training pipeline designs. It can also be regarded as a flexible bottleneck module that can be seamlessly integrated into a general transformer-based segmentation model. The proposed AllSpark outperforms existing methods across all evaluation protocols on Pascal, Cityscapes and COCO benchmarks without bells-and-whistles. Code and model weights are available at: https://github.com/xmed-lab/AllSpark.

References (57)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces AllSpark, a novel transformer-based module that integrates unlabeled data through channel-wise cross-attention to enhance feature labeling for semantic segmentation.
It employs a FIFO semantic memory and channel-wise grouping to effectively reconstruct labeled features, ensuring robust performance across varied benchmarks.
Experiments show significant mIoU gains on datasets like PASCAL VOC and Cityscapes, demonstrating improved segmentation with minimal changes to training pipelines.

Overview of "AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation"

The paper "AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation" addresses a notable challenge in the domain of semi-supervised semantic segmentation (SSSS). This challenge is the reliance on manual pixel-level labeling, which is not only labor-intensive but also a significant bottleneck in scaling segmentation models. Current state-of-the-art methods predominantly utilize pseudo labeling for unlabeled data, which segregates the training flows and causes the labeled data to dominate the training process. This practice often results in the generation of low-quality pseudo labels, leading to sub-optimal model performance.

Methodology

The authors introduce a novel approach termed "AllSpark," which integrates unlabeled features into the training flow of labeled features through a channel-wise cross-attention mechanism. This integration effectively "rebirths" the labeled features, allowing them to benefit from the more diverse and comprehensive perspectives offered by the unlabeled data. The key components of the proposed method are:

Channel-Wise Cross-Attention Mechanism: This mechanism forms the core of the AllSpark module, leveraging contextual information from unlabeled data to reconstruct the labeled features and thus preventing the dominance of labeled data during training.
Semantic Memory (S-Mem): To overcome the limitations posed by a single mini-batch of unlabeled data, the authors adopt a FIFO queue to store features of unlabeled data. This expands the available feature space and allows for a more robust reconstruction process.
Channel-wise Semantic Grouping: This strategy allows for the efficient update of the semantic memory by categorizing channels based on similarities with the probability maps from previous unlabeled features.

Results

The AllSpark module exhibits promising results across multiple benchmarks such as PASCAL VOC 2012, Cityscapes, and COCO datasets, outperforming existing SSSS methods for each test scenario without significant alterations to the existing training pipelines. Specific gains in mIoU illustrate the effectiveness of their approach:

On the PASCAL VOC 2012 original dataset, with 1/8 labeled data, AllSpark attained an mIoU of 78.41%, compared to previous best-state achievements at 77.19%.
On the Cityscapes dataset with 1/8 labeled data, the model reached 79.24% mIoU.
On the challenging COCO dataset, AllSpark consistently achieved higher mIoU scores across all labeling ratios.

Implications and Future Directions

The AllSpark approach provides a shift from the previous paradigm by offering architecture-level modifications that capitalizes on transformers' capacity to leverage vast amounts of unlabeled data effectively. This design could streamline the training of foundation models in sparse labeling scenarios.

Given the results, the AllSpark framework not only sets a new benchmark in semi-supervised semantic segmentation but also elicits further exploration into integrating channel-wise attention mechanisms within different levels of segmentation models. Future research directions could involve adapting this methodology for other types of semi-supervised tasks in computer vision or enhancing its computational efficiency for broader application in industrial settings. Furthermore, the paper opens inquiries into optimizing semantic memory configurations or exploring other memory bank strategies for continuous improvement.

PDF Markdown

Related Papers

GitHub

GitHub - xmed-lab/AllSpark: CVPR 2024: AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation (82 stars)

Tweets

https://twitter.com/haonan_hkust/status/1772840210998657326

https://twitter.com/X_MedAI/status/1773414771104120946