Adversarial Learning for Semi-Supervised Semantic Segmentation (1802.07934v2)

Published 22 Feb 2018 in cs.CV

Abstract: We propose a method for semi-supervised semantic segmentation using an adversarial network. While most existing discriminators are trained to classify input images as real or fake on the image level, we design a discriminator in a fully convolutional manner to differentiate the predicted probability maps from the ground truth segmentation distribution with the consideration of the spatial resolution. We show that the proposed discriminator can be used to improve semantic segmentation accuracy by coupling the adversarial loss with the standard cross entropy loss of the proposed model. In addition, the fully convolutional discriminator enables semi-supervised learning through discovering the trustworthy regions in predicted results of unlabeled images, thereby providing additional supervisory signals. In contrast to existing methods that utilize weakly-labeled images, our method leverages unlabeled images to enhance the segmentation model. Experimental results on the PASCAL VOC 2012 and Cityscapes datasets demonstrate the effectiveness of the proposed algorithm.

Citations (526)

View on Semantic Scholar

Summary

The paper introduces a novel adversarial framework that integrates a fully convolutional discriminator with cross-entropy loss to enhance segmentation accuracy.
It combines adversarial and semi-supervised learning by leveraging unlabeled data with masked cross-entropy losses for robust performance.
Experimental results on PASCAL VOC 2012 and Cityscapes reveal consistent mean IU improvements, validating the method's effectiveness and efficiency.

Adversarial Learning for Semi-Supervised Semantic Segmentation

"Adversarial Learning for Semi-Supervised Semantic Segmentation" presents a novel approach employing adversarial networks to enhance semantic segmentation in a semi-supervised context. The authors propose a methodology utilizing a fully convolutional discriminator to differentiate predicted probability maps from ground truth segmentation distributions. By integrating adversarial loss with standard cross-entropy, the approach aims to refine segmentation accuracy effectively.

Methodology

The primary innovation lies in the design of the discriminator. Unlike conventional methods that classify input images as real or fake at the image level, this approach utilizes a fully convolutional architecture, addressing spatial resolution directly. The conjunction of adversarial loss and cross-entropy loss supports the segmentation model in improving its predictive capabilities.

The authors position the segmentation network as a generator in a GAN framework, wherein it outputs semantic label probability maps given an input image. The adversarial scheme ensures these outputs align closely with ground truth spatially. This high-level structure resembles probabilistic graphical models such as CRFs but forgoes additional post-processing during the testing phase. Importantly, the discriminator is redundant during the inference phase, circumventing added computational burdens.

Semi-Supervised Strategy

The paper delineates its semi-supervised paradigm by leveraging unlabeled data to supply supplemental supervisory signals. The method capitalizes on confidence maps from the discriminator network to guide cross-entropy loss in a self-taught manner. The confidence maps flag trustworthy regions, allowing for masked cross-entropy loss training.

Furthermore, adversarial loss extends to unlabeled data, urging the model to forecast segmentation outputs akin to true distributions. This dual utilization of adversarial and semi-supervised learning constitutes the framework's core contribution, significantly boosting semantic segmentation without additional inference phase costs.

Experimental Results

Experiments conducted on the PASCAL VOC 2012 and Cityscapes datasets substantiate the proposed algorithm's effectiveness. Using varying labeled data subsets, the method consistently outperforms baseline models and demonstrates notable gains in mean IU scores when combined with the proposed semi-supervised learning strategy.

For instance, employing one-eighth of the labeled data in PASCAL VOC 2012, the baseline mean IU score of 66% improved to 69.5% with combined adversarial and semi-supervised training. This consistent improvement across settings highlights the framework's robustness in exploiting both labeled and unlabeled data.

Comparisons and Ablation Study

The paper offers a thorough comparison with existing state-of-the-art models, underscoring the advantages of using a fully convolutional discriminator and an adversarial approach tailored for high spatial resolution predictions. An ablation paper substantiates the necessity of each component; particularly, the usage of GAN's discriminator significantly enhances performance over standalone cross-entropy loss.

Implications and Future Directions

The integration of adversarial training schemes presents a promising direction for semi-supervised learning in semantic segmentation. By efficiently leveraging unlabeled data, the method could reduce dependency on costly per-pixel annotations.

Future research may explore deeper discriminator architectures or alternative adversarial losses tailored for segmentation tasks. Additionally, expanding the framework to more complex scenes or domains could test its adaptability and further its applicability in real-world scenarios.

In summary, this research contributes a significant step forward by harmonizing adversarial learning with semi-supervised methods, offering empirical improvements substantiated by rigorous experimental validation.

PDF Markdown