Recent progress in semantic image segmentation (1809.10198v1)

Published 20 Sep 2018 in cs.CV

Abstract: Semantic image segmentation, which becomes one of the key applications in image processing and computer vision domain, has been used in multiple domains such as medical area and intelligent transportation. Lots of benchmark datasets are released for researchers to verify their algorithms. Semantic segmentation has been studied for many years. Since the emergence of Deep Neural Network (DNN), segmentation has made a tremendous progress. In this paper, we divide semantic image segmentation methods into two categories: traditional and recent DNN method. Firstly, we briefly summarize the traditional method as well as datasets released for segmentation, then we comprehensively investigate recent methods based on DNN which are described in the eight aspects: fully convolutional network, upsample ways, FCN joint with CRF methods, dilated convolution approaches, progresses in backbone network, pyramid methods, Multi-level feature and multi-stage method, supervised, weakly-supervised and unsupervised methods. Finally, a conclusion in this area is drawn.

Citations (399)

View on Semantic Scholar

Summary

The paper provides a comprehensive review contrasting traditional feature-based methods with modern DNN techniques to enhance segmentation accuracy and speed.
It details innovations such as fully convolutional networks, dilated convolutions, and pyramid pooling to capture multi-scale contextual information.
The study demonstrates that integrating CRF with DNN layers refines object boundaries, leading to improved real-world segmentation performance.

Semantic Image Segmentation: Traditional Methods and DNN Advancements

This paper provides a comprehensive overview of semantic image segmentation methods, which have become pivotal in the realms of image processing and computer vision. The paper highlights a clear delineation between traditional methods and the advancements achieved through the application of Deep Neural Networks (DNNs). The authors systematically categorize and review existing approaches, thereby providing a valuable resource for researchers and practitioners aiming to enhance segmentation accuracy and computational efficiency.

Traditional Methods

Prior to the resurgence of DNNs, semantic image segmentation relied heavily on feature extraction and classification techniques. Traditional methods often employed a variety of features such as pixel color, Histogram of Oriented Gradients (HOG), and Scale-Invariant Feature Transform (SIFT). Techniques like K-means clustering and SVMs were prevalent, alongside energy-based models and edge detection. The use of Markov Random Fields (MRF) and Conditional Random Fields (CRF) also marked an important era in traditional semantic image segmentation, achieving considerable success in various application domains.

Datasets and Evaluation Metrics

Significant attention is devoted to the datasets and evaluation metrics central to semantic segmentation research. The authors list prominent datasets including PASCAL VOC, MS COCO, ADE20K, Cityscapes, and KITTI. These datasets provide a foundation for evaluating segmentation algorithms across diverse scenarios, from everyday scenes to autonomous driving contexts. Characteristic evaluation metrics such as Pixel Accuracy, Mean Accuracy, and Intersection over Union (IoU) are elaborated, which remain standard practice in assessing segmentation performance.

Recent Advances in DNN-based Segmentation

With the introduction of DNNs, particularly Convolutional Neural Networks (CNNs), the authors describe a substantial leap in segmentation capabilities.

Fully Convolutional Networks (FCN): The transition from traditional fully connected layers to fully convolutional architectures revolutionized semantic segmentation. FCNs, employing skip connections, enable end-to-end training that yields spatially coherent outputs. The paper cites notable improvements in performance metrics, with FCNs significantly advancing the segmentation benchmarks.
Upsampling Techniques: The paper discusses bilinear interpolation and deconvolution methods, which are pivotal in refining spatial resolutions during segmentation. These methods are effective in reconstructing input sizes and enhancing feature map granularity.
Integration with CRF and Traditional Methods: By integrating CRF with DNN layers, methods like Deeplab achieve finer localization and boundary refinement in segmentations. The paper highlights the efficiency gains realized through these hybrid approaches.
Dilated Convolutions: Dilated or atrous convolutions allow for exponential receptive field expansion without loss of resolution, making them particularly effective in dense prediction tasks.
Backbone Networks: The paper details enhancements to backbone networks, such as the progression from VGGNet to ResNet and ResNeXt, which underpin segmentation models and optimize feature extraction.
Pyramid Methods: Pyramid approaches, including image pyramids and Atrous Spatial Pyramid Pooling (ASPP), overcome limitations of traditional feature capturing by integrating multi-scale context, thereby improving segmentation robustness to scale variations.
Multi-level and Multi-stage Strategies: The exploration into hypercolumns and deep layer cascades presents options for harnessing feature hierarchies, maximizing both local and global information for precise pixel classification.
Learning Paradigms: While most advancements rely on supervised learning, the paper also acknowledges efforts in weakly-supervised and unsupervised segmentation, which are crucial for scaling up to datasets with limited labels.

Implications and Future Directions

The amalgamation of these methods suggests a promising trajectory towards highly accurate and computationally efficient semantic segmentation. The integration of advanced DNN techniques with traditional models continues to drive improvements in benchmark scores and processing speeds. Future developments are likely to focus on lightweight models for real-time applications and the enhancement of unsupervised learning methods to further reduce reliance on large annotated datasets.

In conclusion, this paper provides a valuable synthesis of traditional and contemporary approaches to semantic image segmentation. As research in this area progresses, leveraging these insights will be instrumental in addressing the complex challenges posed by diverse image contexts across various application domains.

PDF Markdown