Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation (1805.04574v2)

Published 11 May 2018 in cs.CV

Abstract: Despite the remarkable progress, weakly supervised segmentation approaches are still inferior to their fully supervised counterparts. We obverse the performance gap mainly comes from their limitation on learning to produce high-quality dense object localization maps from image-level supervision. To mitigate such a gap, we revisit the dilated convolution [1] and reveal how it can be utilized in a novel way to effectively overcome this critical limitation of weakly supervised segmentation approaches. Specifically, we find that varying dilation rates can effectively enlarge the receptive fields of convolutional kernels and more importantly transfer the surrounding discriminative information to non-discriminative object regions, promoting the emergence of these regions in the object localization maps. Then, we design a generic classification network equipped with convolutional blocks of different dilated rates. It can produce dense and reliable object localization maps and effectively benefit both weakly- and semi- supervised semantic segmentation. Despite the apparent simplicity, our proposed approach obtains superior performance over state-of-the-arts. In particular, it achieves 60.8% and 67.6% mIoU scores on Pascal VOC 2012 test set in weakly- (only image-level labels are available) and semi- (1,464 segmentation masks are available) supervised settings, which are the new state-of-the-arts.

Authors (6)

Yunchao Wei (151 papers)
Huaxin Xiao (7 papers)
Honghui Shi (22 papers)
Zequn Jie (60 papers)
Jiashi Feng (295 papers)
Thomas S. Huang (65 papers)

Citations (526)

View on Semantic Scholar

Summary

The paper revisits dilated convolutions to expand receptive fields, significantly improving object localization in weakly- and semi-supervised segmentation.
It employs multiple dilation rates in a classification network to generate dense localization maps, boosting segmentation performance without extra computational cost.
The method achieved mIoU scores of 60.8% and 67.6% on the Pascal VOC 2012 test set, setting a new benchmark in semantic segmentation.

Analysis of "Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation"

The paper presents a novel approach utilizing dilated convolution for enhancing weakly- and semi-supervised semantic segmentation. Recognizing the inherent limitations of weakly supervised frameworks—chiefly their struggle to effectively discern dense object localization maps from mere image-level supervision—the authors propose an innovative application of dilated convolution. This method not only enlarges the receptive fields but also facilitates the transfer of discriminative knowledge to undiscriminated regions, thus generating a richer object localization map.

Key Contributions

Dilated Convolution Blocks: The research revisits dilated convolution, a method initially introduced for augmenting the receptive field size without additional computational costs or parameters. By varying dilation rates, it achieves an effective transfer of discriminative information, bolstering the localization of otherwise non-discriminative object regions.
Classification Network Augmentation: The paper introduces a classification network enhanced with several convolutional blocks, each employing different dilation rates. This configuration yields dense and precise object localization maps capable of significantly improving semantic segmentation performance in both weakly- and semi-supervised contexts.
Robust Performance: The approach demonstrates superior performance metrics, achieving mIoU scores of 60.8% and 67.6% on the Pascal VOC 2012 test set under weakly- and semi-supervised settings, respectively—establishing a new benchmark for state-of-the-art results.

Experimental Setup and Results

By leveraging different dilation rates (d = 1, 3, 6, 9), the authors illustrate how varying scales augment object recognition accuracy within localization maps. The fusion strategy proposed effectively mitigates noise by combining localization maps, suppressing false positives, and intensifying true object regions. Such enhancement is pivotal, especially when combined with techniques like Conditional Random Fields (CRFs), which further refine segmentation outputs.

Theoretical and Practical Implications

Theoretically, this research underscores the potential of revisiting and re-purposing existing methodologies—in this instance, dilated convolutions—for novel applications. Practically, the method offers a cost-effective and computationally efficient solution to enhance weakly supervised segmentation, pertinent for scenarios constrained by limited annotated data. This has immediate implications for fields requiring semantic segmentation but are hindered by costly annotation processes, such as medical imaging or autonomous driving.

Future Directions

The promising results suggest opportunities for extending this work to larger and more complex datasets, like MS COCO or ImageNet, which could validate and potentially enhance the approach. Moreover, addressing identified failure cases, such as inefficient propagation in single-direction object discrimination, could further refine the model's robustness. This paper lays the groundwork for future advancements in using less supervision to achieve increasingly sophisticated levels of semantic understanding in AI systems.

By harnessing the inherent potential of dilated convolutions, this research contributes valuable insights into, and advancements for, the development of scalable, efficient, and accurate weakly- and semi-supervised semantic segmentation techniques.

PDF Markdown