Improving Semantic Segmentation via Decoupled Body and Edge Supervision (2007.10035v2)

Published 20 Jul 2020 in cs.CV

Abstract: Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion. In this paper, a new paradigm for semantic segmentation is proposed. Our insight is that appealing performance of semantic segmentation requires \textit{explicitly} modeling the object \textit{body} and \textit{edge}, which correspond to the high and low frequency of the image. To do so, we first warp the image feature by learning a flow field to make the object part more consistent. The resulting body feature and the residual edge feature are further optimized under decoupled supervision by explicitly sampling different parts (body or edge) pixels. We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries. Extensive experiments on four major road scene semantic segmentation benchmarks including \textit{Cityscapes}, \textit{CamVid}, \textit{KIITI} and \textit{BDD} show that our proposed approach establishes new state of the art while retaining high efficiency in inference. In particular, we achieve 83.7 mIoU \% on Cityscape with only fine-annotated data. Code and models are made available to foster any further research (\url{https://github.com/lxtGH/DecoupleSegNets}).

Authors (8)

Xiangtai Li (128 papers)
Xia Li (101 papers)
Li Zhang (693 papers)
Guangliang Cheng (55 papers)
Jianping Shi (76 papers)
Zhouchen Lin (158 papers)
Shaohua Tan (1 paper)
Yunhai Tong (69 papers)

Citations (237)

View on Semantic Scholar

Summary

The paper proposes a decoupled framework that independently supervises body and edge features to enhance segmentation precision.
It employs a novel flow-based method to warp image features, ensuring internal consistency and detailed boundary extraction.
Experiments on benchmarks like Cityscapes demonstrate that the lightweight module consistently outperforms traditional models.

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

The paper "Improving Semantic Segmentation via Decoupled Body and Edge Supervision" presents a novel approach aimed at enhancing semantic segmentation performance by focusing on distinct aspects of images. Semantic segmentation, a pivotal task in computer vision, assigns a class label to each pixel in an image, thereby playing a critical role in visual understanding for applications like autonomous driving and medical imaging. Existing methodologies predominantly strengthen either spatial consistency within an object by leveraging global context or refine object details via multi-scale feature fusion. This paper proposes a paradigm that distinctly models the object body and edge, utilizing different supervision to improve segmentation accuracy.

Key Contributions

Decoupled Supervision Framework: The authors introduce a unique decoupled framework that separates the supervision of the body and edge components of an image. The body feature captures the low-frequency content that retains smooth structures, whereas the edge feature targets high-frequency details that define object boundaries.
Flow-Based Body Feature Generation: The proposed framework employs a novel flow-based method that warps image features toward the object center through learned offset fields, thus enhancing internal feature consistency. This is accomplished by sampling and differentiating pixels related to the body and edge separately.
Edge Component Extraction: The edge feature is derived explicitly by subtracting the body feature from the input, catering to the more intricate high-frequency details. This separation is crucial for finely capturing object boundaries without being affected by internal object consistency issues.
Complementary Supervision: The approach leverages separate supervisory signals, one for the body and another for the edge, using specifically designed loss functions. This technique ensures that both sub-tasks contribute synergistically to improving overall segmentation performance.
Lightweight and Integrative Module: The framework can be seamlessly integrated into existing state-of-the-art fully convolutional networks with minimal overhead, making it practical for real-time applications.

Experimental Results

Extensive experiments were conducted across multiple benchmarks including Cityscapes, CamVid, KITTI, and BDD, demonstrating that the proposed method consistently outperforms baseline models by significant margins. Specifically, the approach achieved 83.7% mIoU on the challenging Cityscapes dataset using only fine-annotated data, representing a noteworthy improvement in state-of-the-art performance.

Implications and Future Work

The decoupled body and edge supervision strategy introduces a fresh perspective in addressing the long-standing challenges in semantic segmentation. It highlights the importance of treating different image components using targeted methods which could inspire future works to explore even more granular or hierarchical decoupled structures.

Potential future developments might explore extending this decoupled supervision to dynamic or video contexts, where temporal coherence adds another dimension of complexity. Furthermore, future analytical work could focus on the limits and effectiveness of the flow-based approach across various domain contexts and how this can be dynamically adapted on-the-fly.

Overall, this work emphasizes a thoughtful approach to semantic segmentation that respects the multifaceted nature of visual data, promoting better task alignment within the convolutional architectures and opening avenues for efficient, high-accuracy segmentation models applicable across diverse applications.

PDF Markdown