FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation (1903.11816v1)

Published 28 Mar 2019 in cs.CV

Abstract: Modern approaches for semantic segmentation usually employ dilated convolutions in the backbone to extract high-resolution feature maps, which brings heavy computation complexity and memory footprint. To replace the time and memory consuming dilated convolutions, we propose a novel joint upsampling module named Joint Pyramid Upsampling (JPU) by formulating the task of extracting high-resolution feature maps into a joint upsampling problem. With the proposed JPU, our method reduces the computation complexity by more than three times without performance loss. Experiments show that JPU is superior to other upsampling modules, which can be plugged into many existing approaches to reduce computation complexity and improve performance. By replacing dilated convolutions with the proposed JPU module, our method achieves the state-of-the-art performance in Pascal Context dataset (mIoU of 53.13%) and ADE20K dataset (final score of 0.5584) while running 3 times faster.

Citations (271)

View on Semantic Scholar

Summary

The paper introduces the Joint Pyramid Upsampling module to replace dilated convolutions, reducing computational complexity by over three times while preserving segmentation performance.
The method leverages multi-scale context from multi-level feature maps to efficiently generate high-resolution outputs within a classical FCN framework.
FastFCN achieves state-of-the-art benchmark results, including a 53.13% mIoU on Pascal Context, with faster inference suitable for real-time applications.

Analysis of FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation

The paper "FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation" introduces a method to address the computational inefficiencies associated with dilated convolutions in semantic segmentation networks. The authors propose a Joint Pyramid Upsampling (JPU) module, designed to replace dilated convolutions while maintaining high-resolution feature maps without additional computational cost.

Contributions and Methodology

The primary contribution of this work is the development of the JPU module, which tackles the challenge of efficiently generating high-resolution feature maps. Typically, dilated convolutions are utilized for this purpose, but they increase computational complexity and memory usage. The JPU module reframes the task into a joint upsampling problem, leveraging multi-scale context across multi-level feature maps, reducing computational complexity by over three times with no loss in segmentation performance.

Framework Overview

FastFCN employs a classical Fully Convolutional Network (FCN) structure as the backbone. Instead of the dilated convolutions, the JPU module is applied, followed by a multi-scale context module, enabling the extraction of high-level semantic features at a reduced computational cost. The primary insight is positioning JPU as an integral component that efficiently upscales low-resolution feature maps to high-resolution ones using convolutional operations.

Results and Performance

The effectiveness of the proposed method is validated on benchmark datasets such as Pascal Context and ADE20K. With the introduction of the JPU, FastFCN achieves state-of-the-art performance with a mean Intersection over Union (mIoU) of 53.13% on the Pascal Context test set and a final score of 0.5584 on the ADE20K test set. Even with a simpler model architecture, FastFCN operates remarkably faster, showcasing resilience and efficiency by outperforming many existing methods in both speed and accuracy.

Implications and Future Directions

The implications of this work are significant for real-time semantic segmentation tasks where computational resources are limited. By integrating JPU, models can achieve fast inference speeds without compromising accuracy, making them suitable for deployment in real-world applications where latency is critical.

Theoretically, this work challenges the reliance on dilated convolutions and encourages further exploration into alternative methods of feature extraction and upsampling. Future research might focus on refining the JPU or developing other modules that further enhance the trade-off between efficiency and performance.

The FastFCN represents a step forward in optimizing convolutional architectures for semantic segmentation, offering insights that could be adapted or expanded upon in future artificial intelligence research endeavors.

PDF Markdown

Related Papers

GitHub

GitHub - wuhuikai/FastFCN: FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation. (843 stars)