Image Segmentation Using Deep Learning: A Survey (2001.05566v5)

Published 15 Jan 2020 in cs.CV and cs.LG

Abstract: Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

PDF Abstract

Image Segmentation Using Deep Learning: A Survey

The paper "Image Segmentation Using Deep Learning: A Survey" provides an extensive review of the state of the art in the field of image segmentation leveraging deep learning models. This survey captures various deep learning frameworks and approaches employed for semantic and instance-level segmentation tasks. The authors, Shervin Minaee et al., thoroughly examine advanced architectures, including encoder-decoder models, recurrent networks, multi-scale and pyramid methods, attention mechanisms, and generative adversarial networks (GANs). The efficacy of these models is evaluated using multiple benchmarks, illustrating their strengths and weaknesses with detailed numerical results.

Categories of Segmentation Models

The models discussed in the paper are categorized based on their architectural nuances:

Fully Convolutional Networks (FCNs): FCNs eliminate the need for fully connected layers, allowing for arbitrary-sized input images. Long et al.’s pioneering work on FCNs proved vital as it achieved semantic segmentation by producing output maps that maintain the spatial dimensions of the input images. Although effective, FCNs often lack fine granularity due to the absence of pixel-level precision.
Convolutional Models with Graphical Models: Incorporating graphical models such as CRFs can refine the coarse outputs of CNNs. The synergy of CNNs with probabilistic graphical models addresses the deficit in localization abilities inherent to pure CNN approaches.
Encoder-Decoder Architectures: Models like SegNet, U-Net, and V-Net employ an encoder to capture context and a symmetric decoder for precise localization. These have been particularly successful in medical imaging scenarios.
Multi-Scale and Pyramid Networks: These approaches, including PSPNet and DeepLab variants, enhance the network's ability to handle multiple scales and context by employing mechanisms like spatial pyramid pooling and dilated convolutions. These models significantly improve the segmentation of objects at various scales within a scene.
R-CNN Based Models (Instance Segmentation): Frameworks like Mask R-CNN extend object detection capabilities to instance segmentation by generating masks for detected objects. This category excels in applications requiring both detection and segmentation tasks simultaneously.
Dilated Convolutional Models: Dilated convolutions are pivotal in DeepLab models, augmenting the receptive field without increasing computational overhead, thus capturing multi-scale context effectively.
Recurrent Neural Networks (RNNs): Recurrent structures address context propagation challenges in segmentation by modeling dependencies across spatial dimensions, although they are computationally intensive due to their sequential nature.
Attention-Based Models: Attention mechanisms help in focusing on relevant parts of the image, which can significantly improve segmentation accuracy. Dual attention networks and context encoding models are prominent examples in this category.
GANs and Adversarial Training: The integration of GANs with segmentation networks adds an adversarial loss which encourages the generation of more realistic segmentations. This paradigm, albeit computationally demanding, has shown robust performance improvements in segmentation tasks.
Other Models: The paper underscores unique approaches like Deep Convolutional Active Contours, which integrate traditional active contour models with deep learning architectures to refine segment boundaries.

Performance Metrics and Evaluation

The paper provides a comprehensive evaluation of these models using widely adopted metrics such as Intersection over Union (IoU), mean IoU, and pixel accuracy. Tables showcasing performance details on standard datasets like PASCAL VOC, Cityscapes, MS COCO, and ADE20k are included. For example, the DeepLabV3+ model achieves an outstanding mean IoU of 87.8% on the PASCAL VOC dataset, exemplifying the efficacy of encoder-decoder and dilated convolution frameworks.

Implications and Future Directions

The implications of this survey are manifold. Practically, these advanced models can transform diverse domains, from autonomous driving to medical diagnostics. Theoretically, they highlight the immense potential and flexibility of deep learning paradigms in addressing complex segmentation tasks. However, challenges such as real-time processing, memory efficiency, and the need for large annotated datasets persist.

Looking forward, promising research directions include the development of more interpretable models to understand what these networks learn and how they make decisions. Additionally, weakly-supervised and unsupervised learning methods are crucial to reduce dependency on large labeled datasets. Real-time segmentation models will be critical for applications like autonomous systems and robotics. Moreover, advancing 3D point-cloud segmentation can significantly impact fields requiring detailed spatial understanding, such as urban planning and augmented reality.

In conclusion, this survey paper serves as an invaluable resource for researchers in computer vision by succinctly consolidating pivotal works and emerging trends in deep learning-based image segmentation.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Shervin Minaee (51 papers)
Yuri Boykov (26 papers)
Fatih Porikli (141 papers)
Antonio Plaza (17 papers)
Nasser Kehtarnavaz (15 papers)
Demetri Terzopoulos (44 papers)

Citations (2,498)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos