- The paper introduces the AI Benchmark suite that objectively evaluates deep learning performance across over 200 Android devices.
- The paper details the evolution of mobile AI hardware, showing significant improvements in inference speed and energy efficiency with each NPU generation.
- The paper examines the trade-offs in deploying floating-point versus quantized models using TensorFlow Lite and NNAPI for optimized mobile AI performance.
Overview of AI Benchmark: Deep Learning Performance on Smartphones
The paper "AI Benchmark: All About Deep Learning on Smartphones in 2019" presents a detailed analysis of the performance of AI accelerators embedded in mobile System-on-Chips (SoCs) from leading manufacturers such as Qualcomm, HiSilicon, Samsung, MediaTek, and Unisoc. With the rapid advancements in mobile AI hardware, the authors conduct an extensive evaluation of how these developments enable the execution of complex deep learning models on smartphones, a task traditionally confined to more powerful desktop GPUs.
The key contribution of the paper is the AI Benchmark suite, which comprises a series of tests intended to measure the inference speed, accuracy, and memory usage of various deep learning models across a range of mobile devices. By providing a standardized platform for assessing the AI capabilities of smartphones, the benchmark allows for objective comparison and highlights trends in mobile AI advancements.
Hardware Acceleration
The paper details the evolution of mobile AI hardware, observing a substantial improvement in performance with each new generation of mobile NPUs. For instance, the fourth generation of mobile NPUs, as found in SoCs like HiSilicon’s Kirin 990, approaches the performance of desktop-class GPUs from just a few years ago. This leap in capability brings significant implications, suggesting that complex AI tasks can now be feasibly run on mobile platforms.
The authors identify four generations of mobile AI hardware and their associated characteristics, showcasing the progression from initial GPU-accelerated platforms to current dedicated NPU architectures. Qualcomm’s Snapdragon 855, Samsung’s Exynos 9820, MediaTek’s Helio P90, and Unisoc’s Tiger T710 are highlighted as exemplary in enabling efficient AI task execution on smartphones.
Software and Deployment
In a comprehensive review of the Android ML pipeline, the authors discuss the integration of frameworks like TensorFlow Lite, which supports both CPU and specialized hardware acceleration through Android’s Neural Networks API (NNAPI). The transition from TensorFlow Mobile to TensorFlow Lite marks significant progress, with TensorFlow Lite offering reduced binary size and enhanced performance, albeit with some limitations in operation support.
The discussion extends to the pros and cons of deploying floating-point versus quantized models on mobile platforms. Floating-point models, while more accurate, demand greater computational resources, whereas quantized models offer reduced memory footprint and energy consumption but can suffer from accuracy loss.
Performance Evaluation
The AI Benchmark evaluates more than 200 Android devices, providing comparative performance metrics. Notably, the paper documents how mobile AI accelerators, particularly those using NPUs, significantly outperform the GPU-based solutions of previous generations. The Kirin 990, for example, achieves performance nearing that of a mid-range Nvidia GTX 950 GPU. These results underscore a convergence in performance between mobile and desktop AI processing capabilities, hinting at a future where resource-intensive AI applications may operate seamlessly on mobile devices.
Implications and Future Directions
The findings presented in this paper underscore the pivotal role of mobile AI in expanding the practical applications of deep learning. As smartphone architectures continue to evolve, they hold the potential to drive innovation and democratize access to advanced AI technologies. The authors predict that future developments will likely focus on further integration of AI capabilities into mobile ecosystems, leveraging the enhanced performance of upcoming hardware to enable new use cases and push the boundaries of mobile AI.
Given the rapid pace of advancements, the paper invites further exploration into optimizing AI model deployment, improving software frameworks to better exploit hardware capabilities, and ensuring energy efficiency in mobile AI operations. The AI Benchmark suite acts as both a witness and contributor to this landscape, setting a foundation for continued measurement and comparison of emerging mobile AI technologies.