Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends (2402.15490v2)

Published 23 Feb 2024 in cs.LG and cs.NE

Abstract: In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and image segmentation. There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention, depthwise convolutions, and NAS, among others. Each type of CNN has its unique structure and characteristics, making it suitable for specific tasks. It's crucial to gain a thorough understanding and perform a comparative analysis of these different CNN types to understand their strengths and weaknesses. Furthermore, studying the performance, limitations, and practical applications of each type of CNN can aid in the development of new and improved architectures in the future. We also dive into the platforms and frameworks that researchers utilize for their research or development from various perspectives. Additionally, we explore the main research fields of CNN like 6D vision, generative models, and meta-learning. This survey paper provides a comprehensive examination and comparison of various CNN architectures, highlighting their architectural differences and emphasizing their respective advantages, disadvantages, applications, challenges, and future trends.

A Comprehensive Survey of Convolutions in Deep Learning: Navigating Architectures, Challenges, and Innovations

Introduction to Convolutional Neural Networks

The domain of deep learning, particularly Convolutional Neural Networks (CNNs), has witnessed a significant evolution over the past decade. Traditionally renowned for their efficacy in computer vision tasks, CNNs have diversified their application scope to include areas such as NLP, audio signal processing, and even medical image analysis. This survey paper explores the myriad convolution types integral to CNNs, including traditional 2D convolutions, depthwise separable convolutions, dilated, and grouped convolutions, among others. Each type's unique structural attributes, computational implications, and suitability for specific tasks are thoroughly discussed.

Overview of Convolutional Techniques

The core of CNNs—convolutions—serves as the primary mechanism for feature extraction across various data formats. Understanding the intricacies of different convolution types is pivotal for designing architectures optimized for performance and efficiency. This includes recognizing the utility of 1D convolutions for time-series data and audio signals, 3D convolutions for volumetric data, and more specialized forms like transposed convolutions for tasks requiring upsampling. The survey underscores the importance of selecting appropriate convolution methods to maximize accuracy, reduce computational requirements, and accommodate the memory constraints of deployment environments.

Advanced Convolutional Techniques and Their Applications

Beyond basic convolution operations, advanced techniques have emerged, broadening CNNs' applicability and efficiency. Notably, depthwise separable convolutions have gained popularity for their reduced parameter count, proving indispensable in mobile and low-resource applications. Similarly, spatial pyramid pooling and attention mechanisms within convolutions facilitate handling inputs of varying sizes and focusing computational resources on regions of interest. These advancements not only enhance model performance but also address longstanding challenges such as model scalability and interpretability.

Performance Considerations and Trends

Performance and efficiency considerations remain at the forefront of CNN evolution. The survey discusses strategies to balance accuracy with inference speed and model size, highlighting methods like model pruning, quantization, and leveraging mixed-precision computations. It also points to emerging trends, such as incorporating self-supervised learning for pretraining models and exploring architectures that blend CNNs with transformer models for enriched feature representation.

Research Fields and Future Directions

CNNs remain a hotbed of innovation, with research extending into domains like meta-learning, federated learning, and generative models. The exploration of neural architecture search (NAS) and vision transformers exemplifies the field’s dynamism, signaling a shift towards more adaptive and generalized models. Furthermore, the ongoing interest in areas such as 6D vision and multimodal learning underscores the expansive potential of CNNs to revolutionize how we process and interpret data across a spectrum of applications.

Conclusion

Convolutional Neural Networks have transcended their initial applications, proving to be a cornerstone of the deep learning landscape. This survey provides a holistic overview of convolutional techniques, highlighting their significance, challenges, and the exciting trajectory of ongoing research. As CNNs continue to evolve, their adaptability and performance will undoubtedly unlock new possibilities across diverse fields, further cementing their role in advancing artificial intelligence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. N. K. Logothetis and D. L. Sheinberg, ”Visual Object Recognition,” Mar. 1996.
  2. S. Yang and J. Anjie, ”Recognition of Oil and Gas Reservoir Space Based on Deep Learning,” Jan. 2021.
  3. TensorFlow Lite. https://www.tensorflow.org/lite/
  4. R. Kavuluru, ”An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations,” Jan. 2018.
  5. S. Klos and J. Patalas-Maliszewska, ”A Model for the Intelligent Supervision of Production for Industry 4.0,” May. 2022.
  6. A. Maniatopoulos and N. Mitianoudis, ”Learnable Leaky ReLU (LeLeLU): An Alternative Accuracy-Optimized Activation Function,” Dec. 2021.
  7. K. Shaheen, M. A. Hanif, O. Hasan, and M. Shafique, “Continual Learning for Real-World Autonomous Systems: Algorithms, Challenges and Frameworks,” Journal of Intelligent & Robotic Systems, vol. 105, no. 1, Apr. 2022, doi: 1007/s10846-022-01603-6.
  8. M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for CNNs,” ICML 2019.
  9. R. Yu and J. Sun, ”Learning Polynomial-Based Separable Convolution for 3D Point Cloud Analysis,” Jun. 2021.
  10. M. Alam, M. D. Samad, L. Vidyaratne, A. Glandon, and K. M. Iftekharuddin, “Survey on Deep Neural Networks in Speech and Vision Systems,” Neurocomputing, vol. 417, pp. 302–321, Dec. 2020.
  11. Q. Zhang, X. Wang, Y. Wu, H. Zhou, and S.-C. Zhu, “Interpretable CNNs for Object Classification,” vol. 43, no. 10, pp. 3416–3431, Oct. 2021.
  12. M. Kwabena Patrick, A. Felix Adekoya, A. Abra Mighty, and B. Y. Edward, “Capsule Networks – A survey,” Journal of King Saud University - Computer and Information Sciences, Sep. 2019, doi: 10.1016/j.jksuci.2019.09.014.
  13. X. Yuan, Z. Feng, M. Norton, and X. Li, “Generalized Batch Normalization: Towards Accelerating Deep Neural Networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1682–1689, Jul. 2019, doi: 10.1609/aaai.v33i01.33011682.
  14. Y. Wang, G. Wang, C. Chen, and Z. Pan, “Multi-scale dilated convolution of convolutional neural network for image denoising,” Multimedia Tools and Applications, Feb. 2019, doi: 10.1007/s11042-019-7377-y.
  15. H. Zhu, H. Zhang, and Y. Jin, “From federated learning to federated neural architecture search: a survey,” Complex & Intelligent Systems, Jan. 2021, doi: 10.1007/s40747-020-00247-z.
  16. O. N. Oyelade and A. E. Ezugwu, “A bioinspired neural architecture search based convolutional neural network for breast cancer detection using histopathology images,” Scientific Reports, vol. 11, no. 1, Oct. 2021, doi: 10.1038/s41598-021-98978-7.
  17. F. Zhan, H. Zhu, and S. Lu, “Spatial Fusion GAN for Image Synthesis,” Jun. 2019, doi: 10.1109/cvpr.2019.00377.
  18. Tal Ridnik, E. Ben-Baruch, A. Noy, and Lihi Zelnik-Manor, “ImageNet-21K Pretraining for the Masses,” arXiv (Cornell University), Apr. 2021.
  19. M. Elhoseny, “Multi-object Detection and Tracking (MODT) Machine Learning Model for Real-Time Video Surveillance Systems,” Circuits, Systems, and Signal Processing, Aug. 2019, doi: 10.1007/s00034-019-01234-7.
  20. S. Thakur and A. Kumar, “X-ray and CT-scan-based automated detection and classification of covid-19 using convolutional neural networks (CNN),” Biomedical Signal Processing and Control, vol. 69, p. 102920, Aug. 2021, doi: 10.1016/j.bspc.2021.102920.
  21. H. Tan and M. Bansal, “LXMERT: Learning Cross-Modality Encoder Representations from Transformers,” 2019.
  22. G. O. Young, “Synthetic structure of industrial plastics,” in Plastics, vol. 3, Polymers of Hexadromicon, J. Peters, Ed., 2nd ed. New York, NY, USA: McGraw-Hill, 1964, pp. 15-64. [Online]. Available: http://www.bookref.com.
  23. Y. Zhou and Oncel Tuzel, “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” Nov. 2017, doi: 10.48550/arxiv.1711.06396.
  24. G. Othman and D. Q. Zeebaree, “The Applications of Discrete Wavelet Transform in Image Processing: A Review”, jscdm, vol. 1, no. 2, pp. 31–43, Dec. 2020.
  25. A. Saxena, A. Khanna, and D. Gupta, “Emotion Recognition and Detection Methods: A Comprehensive Survey,” Journal of Artificial Intelligence and Systems, vol. 2, no. 1, pp. 53–79, 2020, doi: 10.33969/ais.2020.21005.
  26. S. Fujieda, K. Takayama, and T. Hachisuka, “Wavelet Convolutional Neural Networks,” arXiv.org, May 20, 2018. https://arxiv.org/abs/1805.08620 (accessed Nov. 08, 2023).
  27. Z. Xie, Z. Wen, J. Liu, Z. Liu, X. Wu, and M. Tan, “Deep Transferring Quantization,” Lecture Notes in Computer Science, pp. 625–642, Jan. 2020, doi: https://doi.org/10.1007/978-3-030-58598-3_37.
  28. Moez Krichen, “Convolutional Neural Networks: A Survey,” Computers, vol. 12, no. 8, pp. 151–151, Jul. 2023, doi: 10.3390/computers12080151
  29. A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artificial Intelligence Review, vol. 53, Apr. 2020, doi: 10.1007/s10462-020-09825-6
  30. “Caffe — Deep Learning Framework,” Berkeleyvision.org, 2012. https://caffe.berkeleyvision.org/
  31. PyTorch, “PyTorch,” Pytorch.org, 2023. https://pytorch.org/
  32. TensorFlow, “TensorFlow,” TensorFlow, 2019. https://www.tensorflow.org/
  33. Keras, “Home - Keras Documentation,” Keras.io, 2019. https://keras.io/
  34. OpenCV, “OpenCV library,” Opencv.org, 2019. https://opencv.org/
  35. “apache/mxnet,” GitHub, Jan. 09, 2024. https://github.com/apache/mxnet (accessed Jan. 09, 2024).
  36. “Chainer: A flexible framework for neural networks,” Chainer. https://chainer.org/
  37. “Eclipse DeepLearning4J,” deeplearning4j.konduit.ai. https://deeplearning4j.konduit.ai/
  38. Gang Lv, Y. Sun, Fudong Nian, M. Zhu, W. Tang, and Z. Hu, “COME: Clip-OCR and Master ObjEct for text image captioning,” Image and Vision Computing, vol. 136, pp. 104751–104751, Aug. 2023, doi: 10.1016/j.imavis.2023.104751.
  39. M. R. Gupta, N. P. Jacobson, and E. K. Garcia, “OCR binarization and image pre-processing for searching historical documents,” Pattern Recognition, vol. 40, no. 2, pp. 389–397, Feb. 2007, doi: 10.1016/j.patcog.2006.04.043.
  40. Agung Yuwono Sugiyono, Kendricko Adrio, K. Tanuwijaya, and Kristien Margi Suryaningrum, “Extracting Information from Vehicle Registration Plate using OCR Tesseract,” Procedia Computer Science, vol. 227, pp. 932–938, Jan. 2023, doi: 10.1016/j.procs.2023.10.600.
  41. J. Xu, W. Zhou, Z. Fu, H. Zhang, and L. Li, “A Survey on Green Deep Learning,” arXiv (Cornell University), Nov. 2021, doi: 10.48550/arxiv.2111.05193.
  42. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv (Cornell University), Jun. 2015, doi: 10.48550/arxiv.1506.01497.
  43. R. Joseph, Divvala Santosh, G. Ross, and F. Ali, “You Only Look Once: Unified, Real-Time Object Detection,” arXiv (Cornell University), Jan. 2016, doi: 10.48550/arxiv.1506.02640.
  44. J. Terven, D.-M. Córdova-Esparza, and J.-A. Romero-González, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS,” Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680–1716, Dec. 2023, doi: 10.3390/make5040083.
  45. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid Scene Parsing Network,” arXiv.org, Apr. 27, 2017.
  46. H. Bao, L. Dong, and F. Wei, “BEiT: BERT Pre-Training of Image Transformers,” arXiv (Cornell University), Jun. 2021, doi: 10.48550/arxiv.2106.08254.
  47. K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in Transformer,” arXiv (Cornell University), Feb. 2021, doi: 10.48550/arxiv.2103.00112.
  48. S. Mehta and M. Rastegari, “MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer,” arXiv (Cornell University), Oct. 2021, doi: 10.48550/arxiv.2110.02178.
  49. Y. Song, T. Wang, Subrota Kumar Mondal, and Jyoti Prakash Sahoo, “A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities,” arXiv (Cornell University), May 2022, doi: 10.48550/arxiv.2205.06743.
  50. Y. Wang, Q. Yao, J. Kwok, and L. M. Ni, “Generalizing from a Few Examples: A Survey on Few-Shot Learning,” arXiv.org, Mar. 29, 2020.
  51. W. Wang, V. W. Zheng, H. Yu, and C. Miao, “A Survey of Zero-Shot Learning,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–37, Jan. 2019, doi: 10.1145/3293318.
  52. X. Zhou, A. Sun, Y. Liu, J. Zhang, and C. Miao, “SelfCF: A Simple Framework for Self-supervised Collaborative Filtering,” ACM Transactions on Recommender Systems, Apr. 2023, doi: 10.1145/3591469. ‌
  53. L. Wang, X. Zhang, H. Su, and J. Zhu, “A Comprehensive Survey of Continual Learning: Theory, Method and Application,” arXiv.org, Jun. 10, 2023. https://arxiv.org/abs/2302.00487 (accessed Sep. 04, 2023).
  54. B. Sistaninejhad, H. Rasi, and P. Nayeri, “A Review Paper about Deep Learning for Medical Image Analysis,” Computational and Mathematical Methods in Medicine, vol. 2023, p. e7091301, May 2023, doi: 10.1155/2023/7091301.
  55. Giorgos Papanastasiou, Ni⁢κ⁢o⁢λ⁢α⁢o⁢ς⁢Δ⁢i⁢κ⁢α⁢io⁢ςNi𝜅o𝜆𝛼o𝜍Δi𝜅𝛼io𝜍\text{Ni}\kappa\text{o}\lambda\alpha\text{o}\varsigma\ \Delta\text{i}\kappa% \alpha\text{i}\text{o}\varsigmaNi italic_κ o italic_λ italic_α o italic_ς roman_Δ i italic_κ italic_α roman_i roman_o italic_ς, J. Huang, C. Wang, and G. Yang, “Is attention all you need in medical image analysis? A review,” arXiv (Cornell University), Jul. 2023, doi: 10.48550/arxiv.2307.12775.
  56. S. M. Anwar, M. Majid, A. Qayyum, M. Awais, M. Alnowami, and M. K. Khan, “Medical Image Analysis using Convolutional Neural Networks: A Review,” Journal of Medical Systems, vol. 42, no. 11, Oct. 2018, doi: 10.1007/s10916-018-1088-1.
  57. D. Shen, G. Wu, and H.-I. Suk, “Deep Learning in Medical Image Analysis,” Annual Review of Biomedical Engineering, vol. 19, no. 1, pp. 221–248, Jun. 2017, doi: 10.1146/annurev-bioeng-071516-044442.
  58. J. Jiang, P. Trundle, and J. Ren, “Medical image analysis with artificial neural networks,” Computerized Medical Imaging and Graphics, vol. 34, no. 8, pp. 617–631, Dec. 2010, doi: 10.1016/j.compmedimag.2010.07.003.
  59. C. Sahin, G. Garcia-Hernando, J. Sock, and T.-K. Kim, “A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators,” Image and Vision Computing, vol. 96, p. 103898, Apr. 2020, doi: https://doi.org/10.1016/j.imavis.2020.103898.
  60. Z. He, W. Feng, X. Zhao, and Y. Lv, “6D Pose Estimation of Objects: Recent Technologies and Challenges,” Applied Sciences, vol. 11, no. 1, p. 228, Jan. 2021, doi: 10.3390/app11010228.
  61. Y. Li, G. Wang, X. Ji, Y. Xiang, and D. Fox, “DeepIM: Deep Iterative Matching for 6D Pose Estimation,” International Journal of Computer Vision, vol. 128, no. 3, pp. 657–678, Nov. 2019, doi: 10.1007/s11263-019-01250-9.
  62. T. Elsken, Jan Hendrik Metzen, and F. Hutter, “Neural Architecture Search: A Survey,” arXiv (Cornell University), vol. 20, no. 55, pp. 1–21, Jan. 2019.
  63. J. Mellor, J. Turner, A. Storkey, and E. J. Crowley, “Neural Architecture Search without Training,” arXiv (Cornell University), Jun. 2020.
  64. L. Sekanina, “Neural Architecture Search and Hardware Accelerator Co-Search: A Survey,” IEEE Access, vol. 9, pp. 151337–151362, 2021, doi: 10.1109/access.2021.3126685.
  65. H. Cao, C. Tan, Z. Gao, G. Chen, P. Heng, and S. Z. Li, “A Survey on Generative Diffusion Model,” arXiv (Cornell University), Sep. 2022, doi: 10.48550/arxiv.2209.02646.
  66. K. Zhou and Xin Eric Wang, “FedVLN: Privacy-preserving Federated Vision-and-Language Navigation,” arXiv (Cornell University), Mar. 2022, doi: 10.48550/arxiv.2203.14936.
  67. P. K. Mandal, Carter De Leo, and C. Hurley, “Horizontal Federated Computer Vision,” arXiv (Cornell University), Dec. 2023, doi: 10.48550/arxiv.2401.00390.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Abolfazl Younesi (3 papers)
  2. Mohsen Ansari (3 papers)
  3. MohammadAmin Fazli (18 papers)
  4. Alireza Ejlali (7 papers)
  5. Muhammad Shafique (204 papers)
  6. Jörg Henkel (44 papers)
Citations (21)
X Twitter Logo Streamline Icon: https://streamlinehq.com