Fast-SCNN: Fast Semantic Segmentation Network (1902.04502v1)

Published 12 Feb 2019 in cs.CV

Abstract: The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

Citations (469)

View on Semantic Scholar

Summary

The paper introduces a novel learning to downsample module that streamlines computation and handles multi-resolution branches efficiently.
The paper achieves 68.0% mIoU on Cityscapes at 123.5 fps, demonstrating its balance of speed and performance for high-resolution inputs.
The paper presents a compact model design with 1.11M parameters that reduces memory requirements, enabling deployment on resource-constrained devices.

Fast-SCNN: Fast Semantic Segmentation Network

This paper introduces Fast-SCNN, a semantic segmentation model designed for high-resolution image data that achieves speeds greater than real-time performance. The primary contribution revolves around optimizing computation on embedded devices with limited memory, a necessity driven by the increasing demands of autonomous systems and real-time applications.

Key Contributions

Learning to Downsample Module: Fast-SCNN incorporates a novel 'learning to downsample' module to streamline the computational efficiency of initial layers. This module processes low-level features for different resolution branches simultaneously, limiting redundant calculations. Its efficient path consolidation is akin to a skip connection but is more simplified to maintain runtime efficiency.
Performance Metrics: On the Cityscapes dataset, Fast-SCNN achieves 68.0% mean Intersection over Union (mIoU) while operating at an impressive 123.5 frames per second (fps) on high-resolution inputs (1024×2048px) using a modern GPU.
Minimal Pre-training Requirements: The research underscores that extensive pre-training on large-scale datasets like ImageNet is not essential for this model. Fast-SCNN achieves competitive results without pre-training and only exhibits a minimal improvement (+0.5% mIoU) when additional coarse-labeled data is used.
Compact Model Design: With only 1.11 million parameters, Fast-SCNN significantly reduces memory requirements, facilitating deployment on embedded systems.
Versatile Input Handling: The model's ability to operate on subsampled input without the need for network redesign is practical for diverse application environments where varying input resolutions are commonplace.

Methodology

Fast-SCNN employs a two-branch structure combining a global feature extraction pathway and a learning to downsample module. The network structure allows the model to harness spatial details at a full resolution while capturing contextual information through reduced resolutions. Depthwise separable convolutions and inverted residual bottleneck blocks form the backbone of this architecture, ensuring both efficiency and efficacy.

Comparative Analysis

The paper delineates Fast-SCNN's advantageous performance metrics within the context of existing real-time segmentation models such as ICNet and BiSeNet. Despite Fast-SCNN’s slightly inferior mIoU compared to some models, its speed and lower computational footprint are substantial advantages, especially on embedded devices.

Implications and Future Directions

The implications of Fast-SCNN are significant for real-time AI applications, particularly in autonomous systems where latency and computational efficiency are paramount. The paper suggests that with further refinement, Fast-SCNN could be used in applications like augmented reality on wearables.

Future exploration could focus on further integrating quantization and compression techniques to enhance Fast-SCNN’s deployment efficiency on even more constrained hardware. Given its modular structure, there is potential to combine Fast-SCNN's architecture with advanced feature fusion strategies to push accuracy while maintaining speed. Additionally, similar networks could be optimized for other tasks involving real-time processing of high-resolution data, such as object detection and video analysis.

In conclusion, Fast-SCNN positions itself as an efficient semantic segmentation solution, balancing speed and resource utilization, making it highly suitable for real-time applications across various industrial domains.

PDF Markdown