Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation (1809.06323v3)

Published 17 Sep 2018 in cs.CV

Abstract: Real-time semantic segmentation plays an important role in practical applications such as self-driving and robots. Most semantic segmentation research focuses on improving estimation accuracy with little consideration on efficiency. Several previous studies that emphasize high-speed inference often fail to produce high-accuracy segmentation results. In this paper, we propose a novel convolutional network named Efficient Dense modules with Asymmetric convolution (EDANet), which employs an asymmetric convolution structure and incorporates dilated convolution and dense connectivity to achieve high efficiency at low computational cost and model size. EDANet is 2.7 times faster than the existing fast segmentation network, ICNet, while it achieves a similar mIoU score without any additional context module, post-processing scheme, and pretrained model. We evaluate EDANet on Cityscapes and CamVid datasets, and compare it with the other state-of-art systems. Our network can run with the high-resolution inputs at the speed of 108 FPS on one GTX 1080Ti.

Citations (175)

View on Semantic Scholar

Summary

The paper introduces EDANet, which achieves real-time semantic segmentation by leveraging asymmetric convolutions to reduce computations while maintaining 67.3% mIoU on Cityscapes.
It employs dense connectivity and dilated convolutions to effectively aggregate multi-scale features, enhancing spatial resolution and accuracy.
Empirical validation on Cityscapes and CamVid proves EDANet's balance between speed (up to 108 FPS) and performance for practical autonomous systems.

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

This paper presents a novel approach to real-time semantic segmentation with the introduction of EDANet—an efficient dense module leveraging asymmetric convolution. The significance of real-time semantic segmentation is acknowledged for its practical applications, such as autonomous driving and robotic systems, where both speed and accuracy are crucial. Typically, high accuracy models impose large computational demands making them less viable for real-time applications. Conversely, models favoring speed often compromise on accuracy. EDANet differentiates itself by balancing both computational efficiency and reliability.

Key Features of EDANet

EDANet employs several notable techniques to achieve its performance:

Asymmetric Convolution: Traditional 2D convolution operations are decomposed into two sequential 1D convolutions, reducing the number of model parameters and computational load significantly. This decomposition yields a marked reduction in computational complexity while maintaining high accuracy levels.
Dense Connectivity: Inspired by DenseNet, this mechanism connects layers/modules in a dense fashion, allowing for the aggregation of multi-scale features. While similar connectivity approaches have been applied to image classification, the extension to semantic segmentation proves beneficial, as evidenced by the competitive accuracy of EDANet.
Dilated Convolution: Increasing the receptive field without compromising feature resolution, dilated convolutions in EDANet further contribute to its efficiency. This technique is especially pertinent to preventing loss of spatial information in segmentation tasks.

Empirical Results

EDANet is validated using Cityscapes and CamVid datasets, demonstrating impressive numerical results. It achieves speeds of up to 108 FPS on a GTX 1080Ti while securing mIoU scores comparable to existing methods that demand significantly higher computational resources. Specifically, EDANet attains a 67.3% mIoU on the Cityscapes test set—a compelling evidence of its capability to maintain accuracy despite prioritizing efficiency.

In ablation studies, variants of EDANet revealed insight into the effectiveness of its components. The asymmetric convolution structure matched the accuracy of non-asymmetric variants but with reduced computational cost. Dense connectivity showed a measurable improvement in accuracy over traditional residual connections. Furthermore, adding complex context modules or decoders marginally improved accuracy but at a steep computational price, underscoring EDANet’s optimized architecture.

Implications and Future Directions

The advancements presented by EDANet highlight multiple avenues for further investigation. Practically, its application might be particularly beneficial for real-time systems where computational resources are limited. Theoretically, it prompts discussion on the trade-offs between architectural complexity and model efficiency. As machine learning models continue to evolve in various domains, EDANet illustrates that maintaining simplicity in network design can coincide with achieving desirable performance metrics.

Future research could explore optimization strategies for different tasks, further scalability across hardware, and even broader applications beyond semantic segmentation. The paper suggests an enduring interest in simplifying model architectures while retaining competitive edge, a direction that aligns with the overarching goals of advancing computational efficiency in artificial intelligence.

PDF Markdown