Gated-SCNN: Gated Shape CNNs for Semantic Segmentation (1907.05740v1)

Published 12 Jul 2019 in cs.CV and cs.LG

Abstract: Current state-of-the-art methods for image segmentation form a dense image representation where the color, shape and texture information are all processed together inside a deep CNN. This however may not be ideal as they contain very different type of information relevant for recognition. Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i.e. shape stream, that processes information in parallel to the classical stream. Key to this architecture is a new type of gates that connect the intermediate layers of the two streams. Specifically, we use the higher-level activations in the classical stream to gate the lower-level activations in the shape stream, effectively removing noise and helping the shape stream to only focus on processing the relevant boundary-related information. This enables us to use a very shallow architecture for the shape stream that operates on the image-level resolution. Our experiments show that this leads to a highly effective architecture that produces sharper predictions around object boundaries and significantly boosts performance on thinner and smaller objects. Our method achieves state-of-the-art performance on the Cityscapes benchmark, in terms of both mask (mIoU) and boundary (F-score) quality, improving by 2% and 4% over strong baselines.

Citations (586)

View on Semantic Scholar

Summary

The paper presents a novel two-stream architecture that decouples shape and semantic processing using a gated convolutional layer.
The paper demonstrates state-of-the-art performance on Cityscapes with 2% mIoU and 4% F-score gains, particularly improving segmentation of narrow and distant objects.
The paper leverages a dual-task loss and fusion module to enforce boundary alignment and multi-scale context, optimizing overall segmentation precision.

Gated-SCNN: Enhanced Semantic Segmentation with Two-Stream CNN Architecture

Introduction

Semantic segmentation remains a critical task in computer vision, with diverse applications ranging from autonomous vehicles to image generation. Convolutional Neural Networks (CNNs) have significantly advanced this field, but traditional architectures often fuse color, shape, and texture information into a single processing stream, which can be suboptimal. This paper introduces Gated-SCNN, a two-stream architecture that separates shape processing from the traditional semantic stream, potentially offering a more efficient segmentation solution.

Methodology

The proposed Gated-SCNN architecture consists of two parallel streams: a regular stream for semantic features and a shape stream specifically designed to process boundary-related information. The architecture introduces a novel gating mechanism to facilitate interaction between the streams, enhancing the focus on relevant boundaries while suppressing noise.

Core Components

Regular Stream: This stream is a conventional CNN architecture responsible for capturing semantic features. It can be implemented using standard backbones such as ResNet or WideResNet.
Shape Stream: Operates in parallel, focused exclusively on extracting boundaries using a Gated Convolutional Layer (GCL). The shape stream employs a shallow architecture but processes at full image resolution due to its refined focus on boundaries.
Gated Convolutional Layer (GCL): Central to the architecture, GCLs utilize attention maps derived from the regular stream to filter and enhance boundary-related activations in the shape stream, ensuring a precise focus on shape information.
Fusion Module: Integrates features from both streams using Atrous Spatial Pyramid Pooling, maintaining multi-scale context and outputting refined predictions.
Dual Task Loss: Enforces consistency between boundary predictions and segmentation outcomes, leveraging both tasks' duality to improve boundary alignment.

Experimental Evaluation

The efficacy of Gated-SCNN is demonstrated through extensive experiments on the Cityscapes benchmark. The method achieves state-of-the-art performances with notable improvements in both mIoU and boundary quality (F-score) metrics.

Quantitative Results: On the Cityscapes validation set, Gated-SCNN outperformed previous state-of-the-art models by 2% in mIoU and 4% in F-score for boundary quality. Specific gains were observed for smaller and thinner object categories, such as poles and traffic signs, with improvements up to 7% in IoU.
Distance-Based Evaluation: The approach also showed superior performance at greater distances from the camera, with mIoU gains reaching up to 6% over baseline models.

Implications and Future Directions

The proposed architecture underscores the benefits of explicitly separating shape information in semantic segmentation tasks. The clear enhancement in boundary prediction quality has potential applications in domains requiring precise object delineation. Future research may explore further stream diversification or integration of additional auxiliary tasks to leverage more specific scene characteristics. Additionally, extending this architecture to real-time applications could be beneficial for use cases like autonomous navigation where computational efficiency is crucial.

Conclusion

In conclusion, Gated-SCNN presents a compelling evolution in semantic segmentation architecture, showcasing that the explicit incorporation of shape information through separate processing streams and gating techniques can significantly augment segmentation performance, particularly in challenging scenarios involving complex object boundaries.