Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation (2103.14581v1)

Published 26 Mar 2021 in cs.CV

Abstract: Semantic segmentation aims to classify every pixel of an input image. Considering the difficulty of acquiring dense labels, researchers have recently been resorting to weak labels to alleviate the annotation burden of segmentation. However, existing works mainly concentrate on expanding the seed of pseudo labels within the image's salient region. In this work, we propose a non-salient region object mining approach for weakly supervised semantic segmentation. We introduce a graph-based global reasoning unit to strengthen the classification network's ability to capture global relations among disjoint and distant regions. This helps the network activate the object features outside the salient area. To further mine the non-salient region objects, we propose to exert the segmentation network's self-correction ability. Specifically, a potential object mining module is proposed to reduce the false-negative rate in pseudo labels. Moreover, we propose a non-salient region masking module for complex images to generate masked pseudo labels. Our non-salient region masking module helps further discover the objects in the non-salient region. Extensive experiments on the PASCAL VOC dataset demonstrate state-of-the-art results compared to current methods.

Authors (8)

Yazhou Yao (52 papers)
Tao Chen (397 papers)
Guosen Xie (5 papers)
Chuanyi Zhang (19 papers)
Fumin Shen (50 papers)
Qi Wu (323 papers)
Zhenmin Tang (18 papers)
Jian Zhang (543 papers)

Citations (172)

View on Semantic Scholar

Summary

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

This paper presents an innovative approach to the task of weakly supervised semantic segmentation (WSSS), emphasizing the discovery of objects outside salient regions of images. The authors propose a novel method centered on non-salient region object mining to enhance the performance of semantic segmentation models trained using weak annotations such as image-level labels. The prevailing challenge addressed in this work is the incomplete labeling of images typically derived from regions of high saliency, which often leads to overlooking objects situated in more peripheral or non-conspicuous areas of the image space.

Methodology

The proposed solution involves several key components that together improve the ability of segmentation models to recognize and correctly classify objects in non-salient regions:

Graph-Based Global Reasoning Unit: This unit is integrated into the classification network and is designed to enable the network to effectively capture global relationships among disjoint and distant regions within an image. Unlike local relation modeling typical of traditional CNNs, this unit aids in activating features related to objects that lie outside the immediately noticeable areas.
Potential Object Mining Module (POM): This component is focused on reducing false negatives within pseudo labels. It leverages the differing qualities between CAM (Class Activation Maps) and OA-CAM (Online Accumulated Class Attention Maps) to identify potential object areas in non-salient regions.
Non-Salient Region Masking (NSRM) Module: This module refines the segmentation process further by combining the predictions of an initially trained segmentation model with pseudo labels to produce masked labels, especially for complex images containing multiple object categories.

Results

The paper reports state-of-the-art results on the PASCAL VOC dataset, showcasing superior performance in comparison to several existing methods. Specifically, the proposed model achieves mIoU scores of 65.5% on validation and 65.3% on the test set using a VGG backbone, and 68.3% on validation and 68.5% on the test set with a ResNet backbone. When pre-trained on MS-COCO, the model further improves its results, reaching 70.4% and 70.2% on validation and test sets, respectively.

Implications and Future Work

This paper offers valuable insights into the field of weakly supervised learning, providing a methodology not confined by traditional labor-intensive pixel-level annotations. By concentrating on non-salient areas, this approach promises to reduce annotation burdens and enhance the comprehensiveness of semantic segmentation. Future work could explore adapting this framework to other forms of weak supervision and examining its applicability to larger and more complex datasets such as MS-COCO or Cityscapes, which may demand further scalability and efficiency enhancements. Moreover, the integration with contemporary learning paradigms such as semi-supervised and unsupervised learning could potentially refine its effectiveness and broaden its applicability in real-world scenarios.

PDF Markdown