ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation (2106.05095v2)

Published 9 Jun 2021 in cs.CV

Abstract: Self-training via pseudo labeling is a conventional, simple, and popular pipeline to leverage unlabeled data. In this work, we first construct a strong baseline of self-training (namely ST) for semi-supervised semantic segmentation via injecting strong data augmentations (SDA) on unlabeled images to alleviate overfitting noisy labels as well as decouple similar predictions between the teacher and student. With this simple mechanism, our ST outperforms all existing methods without any bells and whistles, e.g., iterative re-training. Inspired by the impressive results, we thoroughly investigate the SDA and provide some empirical analysis. Nevertheless, incorrect pseudo labels are still prone to accumulate and degrade the performance. To this end, we further propose an advanced self-training framework (namely ST++), that performs selective re-training via prioritizing reliable unlabeled images based on holistic prediction-level stability. Concretely, several model checkpoints are saved in the first stage supervised training, and the discrepancy of their predictions on the unlabeled image serves as a measurement for reliability. Our image-level selection offers holistic contextual information for learning. We demonstrate that it is more suitable for segmentation than common pixel-wise selection. As a result, ST++ further boosts the performance of our ST. Code is available at https://github.com/LiheYoung/ST-PlusPlus.

PDF Abstract

Analysis of the ST++ Framework for Enhanced Semi-supervised Semantic Segmentation

The paper "ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation" introduces an innovative approach to enhance self-training in the context of semi-supervised semantic segmentation. This work specifically addresses the challenges inherent in leveraging unlabeled data for training, a critical issue given the resource-intensive nature of obtaining pixel-wise annotations for fully supervised models.

Key Contributions

Strong Data Augmentations (SDA): The authors introduce a robust baseline for self-training called ST, which incorporates strong data augmentations on unlabeled data. This method, distinct from existing approaches such as iterative re-training or threshold-based filtering, applies aggressive transformations like color jitter, grayscale, blur, and Cutout to prevent overfitting to incorrect pseudo labels and mitigate prediction coupling between teacher and student models.
Selective Re-training (ST++): Building upon the ST framework, the paper proposes ST++, an advanced self-training technique that selects reliable unlabeled samples based on the stability of pseudo-label predictions across multiple model checkpoints. This reliability is quantified through the consistency of predictions across saved model states, allowing for a curriculum learning approach where easier samples are prioritized, and more challenging samples are introduced later in the training process.

Numerical Results

The numerical results illustrate that both ST and ST++ significantly outperform existing methods across several datasets and architectural configurations. For instance, on the Pascal VOC 2012 dataset, even with a minimal labeled set, both frameworks achieved mIOU scores surpassing state-of-the-art methods, highlighting their efficacy in low-label regimes. Similarly, on the Cityscapes dataset, ST and ST++ showcased superior performance irrespective of the backbone used, achieving mIOU scores that exceed existing methods even when tested with a less powerful ResNet-50 backbone compared to competitors using ResNet-101.

Implications and Speculations

The introduction of ST++ has several profound implications. Practically, it provides a more scalable and efficient means of training semantic segmentation models by minimizing the reliance on costly labeled data. Theoretically, the findings suggest that more sophisticated forms of pseudo-labeling, when coupled with adequate data augmentation and sample selection strategies, can lead to significant improvements in model generalization. This framework could inspire further inquiries into designing self-training schemes that constrain overfitting and improve model robustness against label noise in semi-supervised settings.

Future Directions

The research opens up numerous avenues for future exploration. One interesting direction would be to integrate adaptive augmentation strategies that dynamically adjust based on model feedback. Another unexplored area could be grafting ST++ with domain adaptation techniques to handle distribution shifts that often plague deployment in real-world scenarios. Additionally, exploring the relationship between pseudo-label reliability and various network architectures may yield insights into task-specific enhancements that could further boost performance.

Conclusion

This paper significantly advances the method of self-training for semi-supervised semantic segmentation by integrating strong data augmentations and sample prioritization, offering a compelling alternative to existing methods. This approach not only achieves superior performance but also suggests new directions for research in the field of efficient learning with limited labeled data. The contributions therein are poised to influence both academia and industry, prompting enhanced methodologies in machine learning applications demanding minimal annotation efforts.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Lihe Yang (12 papers)
Wei Zhuo (24 papers)
Lei Qi (84 papers)
Yinghuan Shi (79 papers)
Yang Gao (761 papers)

Citations (243)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - LiheYoung/ST-PlusPlus: [CVPR 2022] ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation (233 stars)