Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Edge Devices Inference Performance Comparison (2306.12093v1)

Published 21 Jun 2023 in cs.LG and cs.CV

Abstract: In this work, we investigate the inference time of the MobileNet family, EfficientNet V1 and V2 family, VGG models, Resnet family, and InceptionV3 on four edge platforms. Specifically NVIDIA Jetson Nano, Intel Neural Stick, Google Coral USB Dongle, and Google Coral PCIe. Our main contribution is a thorough analysis of the aforementioned models in multiple settings, especially as a function of input size, the presence of the classification head, its size, and the scale of the model. Since throughout the industry, those architectures are mainly utilized as feature extractors we put our main focus on analyzing them as such. We show that Google platforms offer the fastest average inference time, especially for newer models like MobileNet or EfficientNet family, while Intel Neural Stick is the most universal accelerator allowing to run most architectures. These results should provide guidance for engineers in the early stages of AI edge systems development. All of them are accessible at https://bulletprove.com/research/edge_inference_results.csv

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Language models are few-shot learners. CoRR, abs/2005.14165, 2020. URL https://arxiv.org/abs/2005.14165.
  2. OpenAI. Chatgpt. https://openai.com/blog/chatgpt/, 2022.
  3. High-resolution image synthesis with latent diffusion models. CoRR, abs/2112.10752, 2021. URL https://arxiv.org/abs/2112.10752.
  4. Blazeface: Sub-millisecond neural face detection on mobile gpus. CoRR, abs/1907.05047, 2019. URL http://arxiv.org/abs/1907.05047.
  5. Apple ML Team. Apple face detection on mobile. https://machinelearning.apple.com/research/face-detection, 2017.
  6. Hello edge: Keyword spotting on microcontrollers. CoRR, abs/1711.07128, 2017. URL http://arxiv.org/abs/1711.07128.
  7. A convnet for the 2020s. CoRR, abs/2201.03545, 2022. URL https://arxiv.org/abs/2201.03545.
  8. Swin transformer: Hierarchical vision transformer using shifted windows. CoRR, abs/2103.14030, 2021. URL https://arxiv.org/abs/2103.14030.
  9. Yolov3: An incremental improvement. CoRR, abs/1804.02767, 2018. URL http://arxiv.org/abs/1804.02767.
  10. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015. URL http://arxiv.org/abs/1512.02325.
  11. Mask R-CNN. CoRR, abs/1703.06870, 2017. URL http://arxiv.org/abs/1703.06870.
  12. A survey on the edge computing for the internet of things. IEEE Access, 6:6900–6919, 2018. doi:10.1109/ACCESS.2017.2778504.
  13. A survey on mobile edge computing: The communication perspective. IEEE Communications Surveys and Tutorials, 19(4):2322–2358, 2017. doi:10.1109/COMST.2017.2745201.
  14. Ai and compute. https://openai.com/blog/ai-and-compute/, 2018.
  15. Demystifying iot security: An exhaustive survey on iot vulnerabilities and a first empirical look on internet-scale iot exploitations. IEEE Communications Surveys and Tutorials, 21(3):2702–2733, 2019. doi:10.1109/COMST.2019.2910750.
  16. Lorenzo Franceschi-Bicchierai. How this internet of things stuffed animal can be remotely turned into a spy device. https://www.vice.com/en/article/qkm48b/how-this-internet-of-things-teddy-bear-can-be-remotely-turned-into-a-spy-device, 2017a.
  17. Lorenzo Franceschi-Bicchierai. Internet of things teddy bear leaked 2 million parent and kids message recordings. https://www.vice.com/en/article/pgwean/internet-of-things-teddy-bear-leaked-2-million-parent-and-kids-message-recordings, 2017b.
  18. Inference performance comparison of convolutional neural networks on edge devices. In Sara Paiva, Sérgio Ivan Lopes, Rafik Zitouni, Nishu Gupta, Sérgio F. Lopes, and Takuro Yonezawa, editors, Science and Technologies for Smart Cities, pages 323–335, Cham, 2021. Springer International Publishing. ISBN 978-3-030-76063-2.
  19. Mlperf inference benchmark, 2019.
  20. Google. Tensorflow. https://www.tensorflow.org/, 2023a.
  21. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. URL http://arxiv.org/abs/1704.04861.
  22. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR, abs/1801.04381, 2018. URL http://arxiv.org/abs/1801.04381.
  23. Searching for mobilenetv3. CoRR, abs/1905.02244, 2019. URL http://arxiv.org/abs/1905.02244.
  24. Efficientnet: Rethinking model scaling for convolutional neural networks. CoRR, abs/1905.11946, 2019. URL http://arxiv.org/abs/1905.11946.
  25. Efficientnetv2: Smaller models and faster training. CoRR, abs/2104.00298, 2021. URL https://arxiv.org/abs/2104.00298.
  26. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. URL http://arxiv.org/abs/1512.03385.
  27. Identity mappings in deep residual networks. CoRR, abs/1603.05027, 2016. URL http://arxiv.org/abs/1603.05027.
  28. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  29. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015. URL http://arxiv.org/abs/1512.00567.
  30. Edge intelligence: the confluence of edge computing and artificial intelligence. CoRR, abs/1909.00560, 2019. URL http://arxiv.org/abs/1909.00560.
  31. Google. Google coral ai accelerator datasheet. https://coral.ai/docs/, 2020a.
  32. Google. Google edge tpu. https://cloud.google.com/edge-tpu, 2023b.
  33. NVIDIA. Jetson modules. https://developer.nvidia.com/embedded/jetson-modules, 2023a.
  34. NVIDIA. Jetpack sdk. https://developer.nvidia.com/embedded/jetpack, 2023b.
  35. NVIDIA. Embedded systems for product development. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/product-development/, 2023c.
  36. Intel. Neural compute stick 2 product brief. https://www.intel.com/content/dam/support/us/en/documents/boardsandkits/neural-compute-sticks/NCS2_Product-Brief-English.pdf, 2019.
  37. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi:10.1109/CVPR.2009.5206848.
  38. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020. URL https://arxiv.org/abs/2010.11929.
  39. Google. Pycoral api repository. https://github.com/google-coral/pycoral, 2020b.
  40. Python Soft. Foundation. Timeit package api. https://docs.python.org/3/library/timeit.html, 2023.
  41. ONNX. tf2onnx repository. https://github.com/onnx/tensorflow-onnx, 2021.
  42. Intel. Openvino-dev download page. https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html, 2022a.
  43. Intel. Openvino toolkit packages. https://storage.openvinotoolkit.org/repositories/openvino/packages/2022.2/linux, 2022b.
  44. NVIDIA. Tensorrt. https://developer.nvidia.com/tensorrt, 2023d.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. R. Tobiasz (1 paper)
  2. G. Wilczyński (1 paper)
  3. P. Graszka (1 paper)
  4. N. Czechowski (1 paper)
  5. S. Łuczak (1 paper)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com