Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation (1904.11693v1)

Published 26 Apr 2019 in cs.CV

Abstract: Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are expensive and time-consuming. To address this problem, it is a good choice to learn to segment with weak supervision from bounding boxes. How to make full use of the class-level and region-level supervisions from bounding boxes is the critical challenge for the weakly supervised learning task. In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals. Unlike previous methods directly training models with the fixed individual segment proposals, our method can adjust the model learning with global statistical information. Thus it can help reduce the negative impacts from wrongly labeled proposals. We evaluate the proposed method on the challenging PASCAL VOC 2012 benchmark and compare with other methods. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.

PDF Abstract

An Analytical Overview of "Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation"

The paper addresses the challenge of semantic segmentation under weak supervision using bounding boxes as an alternative to expensive pixel-level annotations. The authors introduce a novel methodology leveraging two key innovations: the Box-driven Class-wise Masking (BCM) model and the Filling Rate guided adaptive loss (FR-loss), aimed at enhancing the performance of weakly supervised segmentation tasks. Both approaches distinctly improve the ability of models to generate accurate semantic masks, demonstrating potential utility in scenarios where full supervision is infeasible.

The BCM model operates by learning specific masks for each class in the segmentation task, enabling it to selectively focus on relevant features and reduce the negative impact of background clutter in bounding boxes. This class-wise attention mechanism provides a direct advantage in comparison to global attention models, allowing more precise object shape learning across various classes. The capability to accurately mask irrelevant regions empowers the feature learning process, particularly in weakly supervised settings where precise pixel information is absent.

In parallel, the Filling Rate guided adaptive loss (FR-loss) leverages the concept of filling rate—the proportion of foreground pixels within bounding boxes—to guide learning by discriminating confident predictions from erroneous ones. By computing mean filling rates for each class, the model adapively sets thresholds for training, effectively filtering out less reliable pixel-level proposals. This inclusion of statistical guidance into the loss function is a pragmatic approach to mitigating inaccuracies inherently present in weakly labeled datasets.

The performance of the proposed methods is extensively validated on the PASCAL VOC 2012 benchmark, where it achieves state-of-the-art results relative to existing weakly supervised methods. The proposed techniques, jointly implemented, yield an impressive performance closely matching that of models trained under full supervision. Such a demonstration is particularly significant considering the reliance solely on bounding box annotations for training—a notable reduction in labeling overhead compared to traditional methods.

These contributions herald implications for future developments within the domain of weakly supervised learning and segmentation. Practically, the success of this approach could spur further integration of bounding box-based supervision as a norm, facilitating broader accessibility to semantic segmentation technologies. Theoretically, the reliance on class-level and region-level statistical information invites more research into similar statistical or heuristic-guided methodologies which might further reduce the dependence on detailed annotations.

In conclusion, this paper introduces effective mechanisms for enhancing weakly supervised semantic segmentation, offering insights that bridge the gap between weak and full supervision effectively leveraging bounding box data. As the field continues to evolve, the exploration of sophisticated guidance systems and adaptive learning strategies will likely propel advancements and refine the efficacy of automated segmentation across a multitude of applications.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Chunfeng Song (11 papers)
Yan Huang (180 papers)
Wanli Ouyang (358 papers)
Liang Wang (512 papers)

Citations (206)

View on Semantic Scholar

Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation (1904.11693v1)

An Analytical Overview of "Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation"

Related Papers