Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images (1912.10230v5)

Published 21 Dec 2019 in cs.CV

Abstract: Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; several deep learning-based 2D semantic segmentation approaches have been proposed within the last decade. In this survey, we mainly focus on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images. We started with an analysis of the public image sets and leaderboards for 2D semantic segmentation, with an overview of the techniques employed in performance evaluation. In examining the evolution of the field, we chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era. We technically analysed the solutions put forward in terms of solving the fundamental problems of the field, such as fine-grained localisation and scale invariance. Before drawing our conclusions, we present a table of methods from all mentioned eras, with a summary of each approach that explains their contribution to the field. We conclude the survey by discussing the current challenges of the field and to what extent they have been solved.

Citations (165)

Summary

  • The paper presents a comprehensive review of semantic segmentation's evolution, categorizing methods into pre-deep learning, FCN era, and post-FCN advancements.
  • It details innovations such as fully convolutional networks, encoder-decoder architectures, and attention mechanisms that significantly improve pixel-level segmentation.
  • The study outlines future directions including neural architecture search, weakly-supervised learning, and few-shot approaches to boost real-time segmentation performance.

Overview of Deep Learning-Based Architectures for Semantic Segmentation on 2D Images

The paper "A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D Images" provides a comprehensive examination of the progression and current state of semantic segmentation approaches within the domain of computer vision and machine learning, focusing solely on deep learning methodologies applied to 2D images. Authored by Irem Ulku and Erdem Akagündüz, it undertakes a chronological dissection of this field by dividing its evolution into three distinct phases: the era preceding deep learning (pre-deep learning era), the era dominated by fully convolutional networks (FCN era), and the subsequent developments past the FCN era (post-FCN era).

Key Insights from the Survey

The authors begin by elucidating the importance of semantic segmentation in diverse applications, such as autonomous vehicles, medical image diagnosis, and robotics, and proceed to dissect various available public datasets utilized in training and benchmarking segmentation tasks. Notable dataset mentions include PASCAL VOC, COCO, and Cityscapes, each serving varied scales and specificity towards urban street scenes, general-purpose imagery, and more. The segmentation landscape is paved by the extraordinary performance capabilities of convolutional neural networks (CNNs), which set benchmarks for semantic understanding of visual data.

Pre-Deep Learning Era: The initial phase predating deep learning saw semantic segmentation leveraging graphical models like Markov Random Fields (MRFs) and Conditional Random Fields (CRFs), which aimed at modeling pixel dependencies. These models provided inference through graphical structures but faltered due to computational inefficiencies and scalability challenges.

Early Deep Learning Approaches: When deep learning began making inroads, the adoption of basic convolutional networks for segmentation rendered initial success, albeit limited by architectural inadequacies like fully connected layers, which restricted real-time processing. New approaches hinted at removing fully connected layers for improved structures.

FCN Era: The introduction of fully convolutional networks revolutionized semantic segmentation by discarding fully connected layers in favor of convolutional layers throughout, enhancing efficiency and enabling segmentation of arbitrarily sized images—facilitated through skip connections allowing finer feature localization.

Post-FCN Era: Beyond FCN, advancements have targeted finer label localization and scale invariance, employing techniques such as encoder-decoder architectures, dilated convolutions, spatial pyramid pooling, and attention mechanisms. Object detection-based methods have emerged, notably with models like Mask-RCNN exploiting Fast-RCNN for instance segmentation.

Implications and Future Directions

The survey underscores several impending challenges within semantic segmentation, primarily revolving around refining global context integration with pixel-wise localization without sacrificing computational efficiency. With real-time applications demanding higher accuracy and speed, methodologies must evolve. The paper suggests potential growth in weakly-supervised methods, domain adaptation strategies for synthetic datasets, and the incorporation of few-shot learning to generalize across unseen categories with minimal training samples.

The paradigm shift towards automated architecture searches, as seen in recent Neural Architecture Search (NAS) developments, indicates future directions towards reducing human intervention in model design, thereby enhancing adaptability and optimization for varied tasks.

In conclusion, this survey not only provides a thorough examination of existing paradigms within semantic segmentation but also posits thoughtful inquiries into plausible advancements necessary for solving pivotal challenges in AI-driven scene understanding. As trajectory forecasts point towards an era underscored by efficiency and adaptability, future research must meticulously navigate new algorithmic designs promising improved segmentation fidelity coupled with processing agility.