Semi-Supervised Semantic Segmentation with Cross-Consistency Training (2003.09005v3)

Published 19 Mar 2020 in cs.CV

Abstract: In this paper, we present a novel cross-consistency based semi-supervised approach for semantic segmentation. Consistency training has proven to be a powerful semi-supervised learning framework for leveraging unlabeled data under the cluster assumption, in which the decision boundary should lie in low-density regions. In this work, we first observe that for semantic segmentation, the low-density regions are more apparent within the hidden representations than within the inputs. We thus propose cross-consistency training, where an invariance of the predictions is enforced over different perturbations applied to the outputs of the encoder. Concretely, a shared encoder and a main decoder are trained in a supervised manner using the available labeled examples. To leverage the unlabeled examples, we enforce a consistency between the main decoder predictions and those of the auxiliary decoders, taking as inputs different perturbed versions of the encoder's output, and consequently, improving the encoder's representations. The proposed method is simple and can easily be extended to use additional training signal, such as image-level labels or pixel-level labels across different domains. We perform an ablation study to tease apart the effectiveness of each component, and conduct extensive experiments to demonstrate that our method achieves state-of-the-art results in several datasets.

View on arXiv

Authors (3)

Yassine Ouali (10 papers)
Céline Hudelot (50 papers)
Myriam Tami (18 papers)

Citations (661)

View on Semantic Scholar

Summary

Semi-Supervised Semantic Segmentation with Cross-Consistency Training: An Overview

The paper "Semi-Supervised Semantic Segmentation with Cross-Consistency Training" introduces a novel approach to enhance semantic segmentation in a semi-supervised manner. The proposed method leverages cross-consistency training (CCT) to improve model performance by utilizing both labeled and unlabeled data. This approach applies perturbations to the encoder's output, aligning with the cluster assumption for robust decision boundary placement in low-density regions of feature space.

Methodology

The core of the technique involves a shared encoder and a main decoder. The encoder is trained with labeled data, while the main and auxiliary decoders enforce prediction consistency by applying various perturbations to the encoder's output. This strategy improves feature representations by encouraging the encoder to produce invariant representations across different perturbations. The key perturbations explored include feature-based (F-Drop, F-Noise), prediction-based (Obj-Msk, Con-Msk, G-Cutout, I-VAT), and random methods (DropOut).

Experiments and Results

Extensive experiments reveal that this approach achieves state-of-the-art results on several benchmark data sets such as PASCAL VOC, Cityscapes, and CamVid. The performance gains are notable across different levels of labeled data availability, with CCT consistently outperforming baseline models and traditional consistency training methods.

Accuracy and Performance:
- For semi-supervised settings, the method exhibits strong numerical results, achieving significant mIoU improvements compared to traditional methods.
- When applied in domain adaptation scenarios (e.g., Cityscapes and SUN RGB-D), CCT demonstrates robust feature alignment between different domains, outperforming previous works.
Flexibility and Extension:
- CCT's design allows easy extension for incorporating additional data types, like image-level labels, without learning conflicts.
- The framework is adaptable, facilitating domain adaptation tasks and handling non-overlapping label spaces efficiently.

Implications and Future Directions

The implications of this paper are substantial, as it provides a scalable solution for scenarios with limited labeled data. The method's ability to leverage unlabeled data through consistency enforcement paves the way for more efficient semantic segmentation frameworks. Future research could explore extending CCT to other visual tasks or incorporating novel perturbation techniques to further enhance model resilience and accuracy.

In conclusion, this paper presents a well-rounded and methodologically sound framework for semi-supervised semantic segmentation, showcasing the potential of cross-consistency training in enhancing model performance while maintaining flexibility and scalability across different datasets and domain configurations.

PDF Markdown

Related Papers

Find Related Papers