Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparative Analysis of CPU and GPU Profiling for Deep Learning Models (2309.02521v3)

Published 5 Sep 2023 in cs.DC and cs.LG

Abstract: Deep Learning(DL) and Machine Learning(ML) applications are rapidly increasing in recent days. Massive amounts of data are being generated over the internet which can derive meaningful results by the use of ML and DL algorithms. Hardware resources and open-source libraries have made it easy to implement these algorithms. Tensorflow and Pytorch are one of the leading frameworks for implementing ML projects. By using those frameworks, we can trace the operations executed on both GPU and CPU to analyze the resource allocations and consumption. This paper presents the time and memory allocation of CPU and GPU while training deep neural networks using Pytorch. This paper analysis shows that GPU has a lower running time as compared to CPU for deep neural networks. For a simpler network, there are not many significant improvements in GPU over the CPU.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Tensorflow profiler.
  2. I. Alkaabwi. Comparison between cpu and gpu for parallel implementation for a neural network model using tensorflow and a big dataset. In Electronic Theses and Dissertations. 3524., 2021. URL https://digitalcommons.library.umaine.edu/etd/3524.
  3. Profiling general purpose gpu applications. In 2009 21st International Symposium on Computer Architecture and High Performance Computing, pages 11–18, 2009. doi: 10.1109/SBAC-PAD.2009.26.
  4. A. P. et.al. Pytorch: An imperative style, high-performance deep learning library., 2019.
  5. Comparative analysis of multiple deep cnn models for waste classification. arXiv preprint arXiv:2004.02168, 2020.
  6. Densely connected convolutional networks, 2016. URL https://arxiv.org/abs/1608.06993.
  7. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
  8. P. A. Lind, E. A performance comparison between cpu and gpu in tensorflow. 2019.
  9. T. M. Mitchell. Machine learning 1st edition. McGraw-Hill Education, 1997.
  10. R. Salgado. Profiling kernels behavior to improve cpu / gpu interactions. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 2, pages 754–756, 2015. doi: 10.1109/ICSE.2015.239.
  11. Optimizing the use of gpu memory in applications with large data sets. In 2009 International Conference on High Performance Computing (HiPC), pages 408–418, 2009. doi: 10.1109/HIPC.2009.5433185.
  12. Flexible software profiling of gpu architectures. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pages 185–197, 2015. doi: 10.1145/2749469.2750375.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Dipesh Gyawali (6 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com