Semi-Supervised Semantic Segmentation with High- and Low-level Consistency (1908.05724v1)

Published 15 Aug 2019 in cs.CV

Abstract: The ability to understand visual information from limited labeled data is an important aspect of machine learning. While image-level classification has been extensively studied in a semi-supervised setting, dense pixel-level classification with limited data has only drawn attention recently. In this work, we propose an approach for semi-supervised semantic segmentation that learns from limited pixel-wise annotated samples while exploiting additional annotation-free images. It uses two network branches that link semi-supervised classification with semi-supervised segmentation including self-training. The dual-branch approach reduces both the low-level and the high-level artifacts typical when training with few labels. The approach attains significant improvement over existing methods, especially when trained with very few labeled samples. On several standard benchmarks - PASCAL VOC 2012, PASCAL-Context, and Cityscapes - the approach achieves new state-of-the-art in semi-supervised learning.

Authors (3)

Sudhanshu Mittal (7 papers)
Maxim Tatarchenko (13 papers)
Thomas Brox (134 papers)

Citations (351)

View on Semantic Scholar

Summary

An Expert Overview of "Semi-Supervised Semantic Segmentation with High- and Low-level Consistency"

The paper under review proposes an advanced methodology for semantic segmentation in computer vision using a semi-supervised approach. The paper is authored by Sudhanshu Mittal, Maxim Tatarchenko, and Thomas Brox, and offers a compelling framework that leverages limited annotated data alongside a substantial amount of unlabeled data. This novel dual-branch approach aims to mitigate the typical training challenges faced when labeled data is scarce, achieving new performance benchmarks on multiple established datasets.

Summary of Approach

The authors address the issue of limited labeled data by introducing a dual-branch system that incorporates both high-level and low-level consistency in the segmentation process. This method unifies semi-supervised image classification with semantic segmentation, facilitated by two network branches:

s4GAN Branch: This branch consists of a semi-supervised segmentation framework using Generative Adversarial Networks (GANs). It features an improved GAN model where a segmentation network (generator) is complemented by a discriminator network, which classifies between actual and generated segmentation maps. The GAN model is refined with a feature matching loss and an innovative self-training mechanism leveraging discriminator scores.
MLMT Branch: The second branch is a Multi-Label Mean Teacher network aimed at enriching high-level semantic information. This branch leverages a multi-label classification setup to enhance overall consistency, ensuring that the identified classes correspond accurately to the image content.

Numerical and Empirical Results

The proposed dual-branch approach is rigorously validated using standard benchmark datasets including PASCAL VOC 2012, PASCAL-Context, and Cityscapes. On these datasets, the approach clearly outperforms existing methods, with notable improvements under tight labeling constraints:

On the PASCAL VOC 2012 dataset, this method reports an impressive performance increment of 11% over previous state-of-the-art methods using only 2% labeled data.
While leveraging the additional capability of image-level weak annotations, the methodology delivers superior results without the need for CRF post-processing, reflecting a significant advancement in semantic segmentation using weakly labeled data sources.

Contribution to the Field and Future Directions

The innovative use of the GAN framework in combination with a mean teacher model for dual-branch consistency strikes a balance between low-level details and high-level semantic coherence. This parallel approach addresses segmentation inaccuracies that commonly plague limited-data environments.

The paper makes a compelling case for further exploration in semi-supervised frameworks, especially in domains where acquiring fully annotated datasets is plausible. Moreover, the paper opens up potential research avenues involving synergy between GANs and other consistency-based semi-supervised learning paradigms. Future work could delve into refinement of the self-training processes and explore broader applications beyond typical image datasets.

In conclusion, the paper introduces a robust semi-supervised semantic segmentation technique, offering potential path-breaking strides in efficiency and efficacy, particularly for tasks involving limited training data. This contribution signifies a step forward in machine learning applications within computer vision, potentially informing subsequent model architectures and training methodologies.

PDF Markdown

Related Papers

Find Related Papers