Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AI Benchmark: All About Deep Learning on Smartphones in 2019 (1910.06663v1)

Published 15 Oct 2019 in cs.PF and cs.LG

Abstract: The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website: http://ai-benchmark.com.

Citations (205)

Summary

  • The paper introduces the AI Benchmark suite that objectively evaluates deep learning performance across over 200 Android devices.
  • The paper details the evolution of mobile AI hardware, showing significant improvements in inference speed and energy efficiency with each NPU generation.
  • The paper examines the trade-offs in deploying floating-point versus quantized models using TensorFlow Lite and NNAPI for optimized mobile AI performance.

Overview of AI Benchmark: Deep Learning Performance on Smartphones

The paper "AI Benchmark: All About Deep Learning on Smartphones in 2019" presents a detailed analysis of the performance of AI accelerators embedded in mobile System-on-Chips (SoCs) from leading manufacturers such as Qualcomm, HiSilicon, Samsung, MediaTek, and Unisoc. With the rapid advancements in mobile AI hardware, the authors conduct an extensive evaluation of how these developments enable the execution of complex deep learning models on smartphones, a task traditionally confined to more powerful desktop GPUs.

The key contribution of the paper is the AI Benchmark suite, which comprises a series of tests intended to measure the inference speed, accuracy, and memory usage of various deep learning models across a range of mobile devices. By providing a standardized platform for assessing the AI capabilities of smartphones, the benchmark allows for objective comparison and highlights trends in mobile AI advancements.

Hardware Acceleration

The paper details the evolution of mobile AI hardware, observing a substantial improvement in performance with each new generation of mobile NPUs. For instance, the fourth generation of mobile NPUs, as found in SoCs like HiSilicon’s Kirin 990, approaches the performance of desktop-class GPUs from just a few years ago. This leap in capability brings significant implications, suggesting that complex AI tasks can now be feasibly run on mobile platforms.

The authors identify four generations of mobile AI hardware and their associated characteristics, showcasing the progression from initial GPU-accelerated platforms to current dedicated NPU architectures. Qualcomm’s Snapdragon 855, Samsung’s Exynos 9820, MediaTek’s Helio P90, and Unisoc’s Tiger T710 are highlighted as exemplary in enabling efficient AI task execution on smartphones.

Software and Deployment

In a comprehensive review of the Android ML pipeline, the authors discuss the integration of frameworks like TensorFlow Lite, which supports both CPU and specialized hardware acceleration through Android’s Neural Networks API (NNAPI). The transition from TensorFlow Mobile to TensorFlow Lite marks significant progress, with TensorFlow Lite offering reduced binary size and enhanced performance, albeit with some limitations in operation support.

The discussion extends to the pros and cons of deploying floating-point versus quantized models on mobile platforms. Floating-point models, while more accurate, demand greater computational resources, whereas quantized models offer reduced memory footprint and energy consumption but can suffer from accuracy loss.

Performance Evaluation

The AI Benchmark evaluates more than 200 Android devices, providing comparative performance metrics. Notably, the paper documents how mobile AI accelerators, particularly those using NPUs, significantly outperform the GPU-based solutions of previous generations. The Kirin 990, for example, achieves performance nearing that of a mid-range Nvidia GTX 950 GPU. These results underscore a convergence in performance between mobile and desktop AI processing capabilities, hinting at a future where resource-intensive AI applications may operate seamlessly on mobile devices.

Implications and Future Directions

The findings presented in this paper underscore the pivotal role of mobile AI in expanding the practical applications of deep learning. As smartphone architectures continue to evolve, they hold the potential to drive innovation and democratize access to advanced AI technologies. The authors predict that future developments will likely focus on further integration of AI capabilities into mobile ecosystems, leveraging the enhanced performance of upcoming hardware to enable new use cases and push the boundaries of mobile AI.

Given the rapid pace of advancements, the paper invites further exploration into optimizing AI model deployment, improving software frameworks to better exploit hardware capabilities, and ensuring energy efficiency in mobile AI operations. The AI Benchmark suite acts as both a witness and contributor to this landscape, setting a foundation for continued measurement and comparison of emerging mobile AI technologies.