Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach (1911.08039v1)

Published 19 Nov 2019 in cs.CV

Abstract: Weakly supervised semantic segmentation is a challenging task as it only takes image-level information as supervision for training but produces pixel-level predictions for testing. To address such a challenging task, most recent state-of-the-art approaches propose to adopt two-step solutions, \emph{i.e. } 1) learn to generate pseudo pixel-level masks, and 2) engage FCNs to train the semantic segmentation networks with the pseudo masks. However, the two-step solutions usually employ many bells and whistles in producing high-quality pseudo masks, making this kind of methods complicated and inelegant. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into confident yet tiny object/background regions. Such reliable regions are then directly served as ground-truth labels for the parallel segmentation branch, where a newly designed dense energy loss function is adopted for optimization. Despite its apparent simplicity, our one-step solution achieves competitive mIoU scores (\emph{val}: 62.6, \emph{test}: 62.9) on Pascal VOC compared with those two-step state-of-the-arts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC (\emph{val}: 66.3, \emph{test}: 66.5).

PDF Abstract

Weakly Supervised Semantic Segmentation via Reliable Region Mining

This paper, titled "Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach," offers an innovative methodology for weakly supervised semantic segmentation by leveraging image-level labels to generate reliable pixel-level annotations. The authors propose an end-to-end network, termed Reliable Region Mining (RRM), which moves beyond the prevalent two-step solutions that often involve complex processes to create pseudo-labels.

Methodology Overview

The proposed framework integrates two parallel branches into a unified network, utilizing the same backbone architecture:

Classification Branch: This branch generates class activation maps (CAMs) which are then processed to identify reliable pixel-level regions. These regions are refined further through a dense CRF process to eliminate noise, leaving only high-confidence annotations. Compared to traditional methods, which aim for denser object region mining, this approach focuses on accuracy through careful pruning of the CAMs.
Semantic Segmentation Branch: This branch employs the refined labels from the classification branch as ground-truth data to optimize the segmentation network. A distinct feature of this branch is its loss function, which combines cross-entropy loss with a novel dense energy loss. This combination allows the network to incorporate constraints from both labeled and unlabeled regions, utilizing shallow features, such as color and spatial information, across the entire image.

Evaluation and Results

The effectiveness of the RRM approach is evaluated on the PASCAL VOC 2012 dataset, achieving mean Intersection over Union (mIoU) scores of 62.6 on the validation set and 62.9 on the test set. These results demonstrate competitive performance against state-of-the-art two-step approaches without the complexity often associated with multi-stage pipelines. Furthermore, by extending RRM into a two-step process, the performance further improves, yielding new state-of-the-art scores of 66.3 and 66.5 on validation and test datasets, respectively.

Comparative Analysis

The RRM's performance measures favorably against other end-to-end weakly supervised models. The proposed one-step framework shows marked improvements in efficiency and computational simplicity while maintaining comparable accuracy. Unlike methods such as EM-Adapt which rely on expectation-maximization, RRM's training is straightforward, leveraging directly mined reliable regions and optimized through a robust dual-loss function.

Implications and Future Developments

The introduction of the RRM approach presents significant implications for weakly supervised learning frameworks in terms of simplifying model architectures and reducing reliance on complex preprocessing steps. The authors successfully demonstrate that high-quality segmentation can be achieved without intricate two-step pseudo-label generation, paving the way for more streamlined and effective solutions in semantic segmentation. Future research directions may explore further refining the dense energy loss function and applying similar methodologies to other types of weak supervision frameworks, such as bounding boxes or scribbles, potentially expanding the applicative scope of the RRM method in various domains of computer vision.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Bingfeng Zhang (12 papers)
Jimin Xiao (38 papers)
Yunchao Wei (151 papers)
Mingjie Sun (29 papers)
Kaizhu Huang (95 papers)

Citations (191)

View on Semantic Scholar