Advancements in Image Classification using Convolutional Neural Network (1905.03288v1)

Published 8 May 2019 in cs.CV and cs.AI

Abstract: Convolutional Neural Network (CNN) is the state-of-the-art for image classification task. Here we have briefly discussed different components of CNN. In this paper, We have explained different CNN architectures for image classification. Through this paper, we have shown advancements in CNN from LeNet-5 to latest SENet model. We have discussed the model description and training details of each model. We have also drawn a comparison among those models.

PDF Abstract

Advancements in Image Classification using Convolutional Neural Networks

The paper "Advancements in Image Classification using Convolutional Neural Network" offers a comprehensive exploration of the evolution and capabilities of Convolutional Neural Networks (CNNs) in the field of image classification. The authors, Farhana Sultana, Abu Sufian, and Paramartha Dutta, review the progression from foundational models, like LeNet-5, through to contemporary architectures such as SENet, demonstrating how they each contribute to performance improvements in image classification tasks.

The evaluation begins with LeNet-5, a pioneering CNN developed for recognizing handwritten digits. Despite its groundbreaking nature for its time, LeNet-5's limited capacity was constrained by a lack of computational power and large datasets. The breakthrough came in 2012 with AlexNet, which capitalized on increased computational resources and a substantial dataset, ImageNet. By integrating Rectified Linear Units (ReLU) for non-linearity and dropout for regularization, AlexNet set a new benchmark for CNN performance, achieving an impressive reduction in error rates for classification tasks.

Following AlexNet, ZFNet introduced optimizations such as smaller filter sizes leading to a better understanding via visualization techniques, while VGGNet demonstrated that depth could significantly impact performance. The VGGNet employed small $3 \times 3$ filters in deep architectures, which improved performance markedly over shallower networks.

The inception of GoogLeNet marked a deviation from traditional CNN approaches with its introduction of the Inception module to process multiple scale levels, demonstrating superior performance with reduced computational resource requirements. The subsequent introduction of the residual learning framework in ResNet addressed the degradation problem in very deep networks, presenting an architecture that could train networks beyond 100 layers without accuracy loss, further refining the CNN capabilities.

DenseNet extended these improvements by promoting extensive feature reuse through dense connections, wherein each layer receives inputs from all preceding layers. This approach reduced the number of parameters while enhancing the network's ability to propagate features and gradients.

CapsNets attempted to resolve the limitations posed by pooling layers in traditional CNNs by introducing capsules to retain spatial hierarchies. Despite achieving state-of-the-art results on simpler datasets like MNIST, CapsNets are yet to demonstrate competitive performance on more complex datasets.

Finally, SENet, with its "Squeeze-and-Excitation" blocks, achieves top performance by recalibrating feature maps and dynamically refining the model's focus, setting new standards in classification accuracies.

The paper concludes by observing that each of these architectures has leveraged differing methodological innovations and optimizations to improve the accuracy and efficiency of image classification tasks. The integration of novel components, such as residual connections and dense blocks, has pushed CNNs further, providing an effective toolset for modern computer vision applications.

In terms of implications, these advancements reinforce CNNs as a pivotal technology in AI, particularly in areas demanding high-fidelity image classification. As architectures continue to evolve, we can anticipate further enhancement in both computational efficiency and classification accuracy, potentially extending the utility of CNNs to even more complex tasks in diverse domains. Future research may explore hybrid models or novel training regimes that could offer further gains in performance or enable new applications of deep learning technologies.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Farhana Sultana (3 papers)
A. Sufian (7 papers)
Paramartha Dutta (12 papers)

Citations (253)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos