CGNet: A Light-weight Context Guided Network for Semantic Segmentation (1811.08201v2)

Published 20 Nov 2018 in cs.CV

Abstract: The demand of applying semantic segmentation model on mobile devices has been increasing rapidly. Current state-of-the-art networks have enormous amount of parameters hence unsuitable for mobile devices, while other small memory footprint models follow the spirit of classification network and ignore the inherent characteristic of semantic segmentation. To tackle this problem, we propose a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation. We first propose the Context Guided (CG) block, which learns the joint feature of both local feature and surrounding context, and further improves the joint feature with the global context. Based on the CG block, we develop CGNet which captures contextual information in all stages of the network and is specially tailored for increasing segmentation accuracy. CGNet is also elaborately designed to reduce the number of parameters and save memory footprint. Under an equivalent number of parameters, the proposed CGNet significantly outperforms existing segmentation networks. Extensive experiments on Cityscapes and CamVid datasets verify the effectiveness of the proposed approach. Specifically, without any post-processing and multi-scale testing, the proposed CGNet achieves 64.8% mean IoU on Cityscapes with less than 0.5 M parameters. The source code for the complete system can be found at https://github.com/wutianyiRosun/CGNet.

Citations (445)

View on Semantic Scholar

Collections

Summary

The paper introduces the Context Guided (CG) block that jointly learns local features and contextual information to enhance segmentation accuracy with minimal complexity.
The model employs a streamlined three-stage down-sampling architecture and deep-and-thin design, significantly reducing computational overhead while maintaining competitive performance.
CGNet's efficient design and contextual attention mechanism enable deployment on mobile devices, paving the way for real-time applications in autonomous driving and augmented reality.

Insights into CGNet: A Lightweight Solution for Semantic Segmentation on Mobile Devices

Overview

The paper presents a novel approach to semantic segmentation with an emphasis on efficiency and suitability for mobile devices. This work introduces the Context Guided Network (CGNet), a lightweight model that significantly reduces the computational demands typical of state-of-the-art segmentation networks while maintaining competitive accuracy. CGNet's architecture is notably distinguished by the integration of the Context Guided (CG) block, which leverages both local and contextual information to enhance feature learning and segmentation precision.

Key Contributions

The primary innovation of this work, the CG block, facilitates the joint learning of local features and their surrounding context. Unlike traditional models, which often neglect the unique requirements of semantic segmentation by adhering too closely to architectures designed for image classification, the CG block is tailor-made for the segmentation task. It introduces a structure that combines local feature extraction, surrounding context gathering via dilated convolutions, and global context adjustment through a weighted attention mechanism. This design decision allows CGNet to maintain a delicate balance between model complexity and accuracy.

Moreover, CGNet is designed with a streamlined architecture encompassing three down-sampling stages, as opposed to the five stages found in other models like ResNet or DenseNet-derived methods. This factor significantly curbs computational overhead, making the model amenable to mobile deployment. The network's depth and channel dimensions are carefully chosen to promote efficiency without sacrificing performance—a concept termed "deep and thin."

Experimental Results

The paper demonstrates CGNet's competitive performance through extensive evaluations on the Cityscapes and CamVid datasets, popular benchmarks in the field of semantic segmentation. CGNet achieves a mean Intersection-over-Union (IoU) of 64.8% on the Cityscapes test set with fewer than 0.5 million parameters. This result is achieved without relying on techniques such as pre-processing, post-processing, or multi-scale testing, which are typically used to enhance accuracy but also add complexity and computational burden.

In comparison to other models, CGNet shows significant improvements in computational efficiency. The network's FLOPS and memory usage are considerably lower than those of more complex systems, such as PSPNet or DenseASPP, yet it performs favorably in maintaining segmentation accuracy. When juxtaposed with lightweight alternatives like ENet or ESPNet, CGNet still manages to deliver superior accuracies, further affirming its balanced design.

Implications and Future Directions

The implications of this research are considerable, particularly in advancing the deployment of semantic segmentation in resource-constrained environments, such as mobile and edge devices. This shift opens new avenues for real-time applications in areas like autonomous driving, augmented reality, and robotics, where both quick inference and accuracy are critical.

For future exploration, the integration of advanced techniques such as neural architecture search (NAS) or further advancements in contextual attention mechanisms could potentially enhance CGNet's capabilities. Moreover, exploring adaptive models that could dynamically adjust computational resources based on input complexity might offer insightful directions for subsequent research.

Conclusion

Overall, the CGNet paper contributes valuable insights into the pursuit of efficient semantic segmentation. By addressing the distinctive characteristics of the task with a well-suited architectural approach, this work paves the way for broader adoption of semantic segmentation models in practical applications. Through its emphasis on lightweight design paired with competitive performance, CGNet exemplifies how innovation in model architecture can drive progress in the field of computer vision.