SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow (2207.04415v2)

Published 10 Jul 2022 in cs.CV

Abstract: In this paper, we focus on exploring effective methods for faster and accurate semantic segmentation. A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation. Two strategies are widely used: atrous convolutions and feature pyramid fusion, while both are either computationally intensive or ineffective. Inspired by the Optical Flow for motion alignment between adjacent video frames, we propose a Flow Alignment Module (FAM) to learn \textit{Semantic Flow} between feature maps of adjacent levels and broadcast high-level features to high-resolution features effectively and efficiently. Furthermore, integrating our FAM to a standard feature pyramid structure exhibits superior performance over other real-time methods, even on lightweight backbone networks, such as ResNet-18 and DFNet. Then to further speed up the inference procedure, we also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps where we term the improved version network as SFNet-Lite. Extensive experiments are conducted on several challenging datasets, where results show the effectiveness of both SFNet and SFNet-Lite. In particular, when using Cityscapes test set, the SFNet-Lite series achieve 80.1 mIoU while running at 60 FPS using ResNet-18 backbone and 78.8 mIoU while running at 120 FPS using STDC backbone on RTX-3090. Moreover, we unify four challenging driving datasets into one large dataset, which we named Unified Driving Segmentation (UDS) dataset. It contains diverse domain and style information. We benchmark several representative works on UDS. Both SFNet and SFNet-Lite still achieve the best speed and accuracy trade-off on UDS, which serves as a strong baseline in such a challenging setting. The code and models are publicly available at https://github.com/lxtGH/SFSegNets.

Authors (7)

Xiangtai Li (128 papers)
Jiangning Zhang (102 papers)
Yibo Yang (80 papers)
Guangliang Cheng (55 papers)
Kuiyuan Yang (20 papers)
Yunhai Tong (69 papers)
Dacheng Tao (829 papers)

Citations (19)

View on Semantic Scholar

Summary

The paper introduces Semantic Flow to dynamically align multi-scale features for efficient, real-time semantic segmentation.
It proposes innovative modules, FAM and GD-FAM, to resolve feature misalignment and enhance spatial detail.
Experimental results on datasets like Cityscapes demonstrate 80.1 mIoU at 60 FPS and 120 FPS speedups using advanced backbones.

Semantic Flow Networks for Efficient Semantic Segmentation

The paper, titled "SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow," introduces a novel approach to semantic segmentation by leveraging a concept referred to as "Semantic Flow." Semantic Segmentation, a critical computer vision task, involves assigning a class label to every pixel in an image. A primary challenge in real-world scenarios is achieving high accuracy with low computational overhead, particularly for high-resolution inputs required in applications like autonomous driving.

Methodology

The authors propose the Flow Alignment Module (FAM) and its improved variant, the Gated Dual Flow Alignment Module (GD-FAM). These modules are formulated to resolve the misalignment issue in conventional feature pyramid networks (FPNs). Misalignment arises due to the repetitive use of downsampling and upsampling operations, which affects semantic consistency across feature levels. Inspired by the optical flow concept common in video processing, the paper introduces Semantic Flow to dynamically align multi-level feature maps.

Flow Alignment Module (FAM): The FAM aims to learn the semantic flow—the mapping between high-level and low-level feature maps—enabling accurate alignment. FAM integrates with the standard FPN structure, refining the semantic representation from deep layers and enhancing spatial details.
Gated Dual Flow Alignment Module (GD-FAM): To further streamline the process and enhance inference speed, GD-FAM is introduced. It aligns high-resolution and low-resolution features directly, eliminating the need for multiple stages of alignment.

Numerical Results and Dataset

The experimental section highlights extensive evaluations conducted on several datasets such as Cityscapes, Mapillary, IDD, BDD, and a newly unified driving segmentation dataset (UDS). Key results include:

Achieving 80.1 mIoU at 60 FPS using a ResNet-18 backbone on the Cityscapes test set.
The GF-FAM module demonstrated a significant speedup compared to existing methods, achieving 120 FPS using the STDC backbone on RTX-3090.

The merging of diverse datasets into the UDS is notable as it provides a more challenging and comprehensive evaluation benchmark, simulating varied driving conditions and styles.

Implications and Future Directions

This research has both theoretical and practical implications:

Theoretical Advancements: The introduction of the Semantic Flow concept provides insights into better feature alignment approaches, aligning feature maps more accurately, which can be extrapolated to other domains of computer vision requiring multi-scale feature fusion.
Practical Applications: The proposed SFNet and its variant SFNet-Lite offer real-time processing capabilities crucial for applications in autonomous driving, surveillance, and robotics, where both accuracy and speed are paramount.

Future research could explore the applicability of the proposed modules in other tasks requiring efficient handling of high-resolution inputs, such as video segmentation, or further optimization of the Semantic Flow learning process itself. Integrating these techniques with transformer-based architectures may also yield complementary enhancements.

This paper provides a substantial contribution to the field of real-time semantic segmentation, balancing efficiency and performance with innovative alignment strategies.

PDF Markdown

Related Papers

GitHub

GitHub - lxtGH/SFSegNets: [ECCV-2020-oral]-Semantic Flow for Fast and Accurate Scene Parsing (357 stars)