Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AI Benchmark: Running Deep Neural Networks on Android Smartphones (1810.01109v2)

Published 2 Oct 2018 in cs.AI and cs.CV

Abstract: Over the last years, the computational power of mobile devices such as smartphones and tablets has grown dramatically, reaching the level of desktop computers available not long ago. While standard smartphone apps are no longer a problem for them, there is still a group of tasks that can easily challenge even high-end devices, namely running artificial intelligence algorithms. In this paper, we present a study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones. We give an overview of the hardware acceleration resources available on four main mobile chipset platforms: Qualcomm, HiSilicon, MediaTek and Samsung. Additionally, we present the real-world performance results of different mobile SoCs collected with AI Benchmark that are covering all main existing hardware configurations.

Citations (309)

Summary

  • The paper introduces the AI Benchmark tool to deliver empirical performance metrics for deep neural networks on Android.
  • The paper details hardware acceleration across Qualcomm, HiSilicon, MediaTek, and Samsung chipsets using NPUs and DSPs.
  • The paper compares TensorFlow Mobile with TensorFlow Lite, revealing trade-offs in model support and optimization for AI tasks.

AI Benchmark: Running Deep Neural Networks on Android Smartphones

The computational power of mobile devices has seen dramatic increases over recent years, elevating many smartphones to the level of desktops from not too long ago. However, running complex AI algorithms continues to challenge even high-end mobile devices. The paper "AI Benchmark: Running Deep Neural Networks on Android Smartphones" provides an extensive analysis of the capabilities and constraints of deploying deep learning models on Android devices. It includes a detailed examination of the available frameworks and programming models, the hardware acceleration potentials across various mobile SoCs, and presents empirical data illustrating the performance of these devices when subjected to AI workloads.

Overview of Deep Learning and Mobile Platforms

The paper begins by emphasizing the burgeoning role of deep learning on mobile platforms, focusing on tasks like computer vision, NLP, and sensor data processing. These types of tasks are highly computational, thus demanding substantial hardware support to achieve efficient execution with minimal power consumption. Mobile phones have integrated System-on-Chip (SoC) solutions that include CPUs, GPUs, and sometimes DSPs or NPUs, tailored to enhance computational throughput. Among the platforms evaluated are four principal mobile chipset manufacturers: Qualcomm, HiSilicon, MediaTek, and Samsung.

Hardware Acceleration and Mobile SoCs

One of the more poignant aspects addressed in the paper is the hardware acceleration resources available for AI tasks. Qualcomm, with its Snapdragon Neural Processing Engine (SNPE), supports frameworks such as Caffe and TensorFlow, offering heterogeneous computing via its CPU, GPU, and DSP combinations. Similarly, HiSilicon's Kirin chipsets use NPUs for enhanced AI performance, while MediaTek’s NeuroPilot and Samsung's constrained yet evolving Exynos platform aim to leverage ARM-based processing enhancements like NEON and DynamIQ technologies.

The AI Benchmark

The paper introduces the AI Benchmark, a diagnostic tool devised to assess deep learning performance on Android devices. This benchmark evaluates multiple neural network models across different deep learning tasks, including image classification, face recognition, and super-resolution, to name a few. Each test is meticulously chosen to reflect the diverse and intensive computational requirements typical in mobile AI applications. The AI Benchmark also provides insights into the effectiveness of hardware accelerators, quantization techniques, and the RAM limitations during AI model execution.

Key Findings

  1. Empirical Performance Metrics:
    • The benchmark reveals disparate performance metrics across devices, largely contingent on the presence and efficiency of hardware acceleration. Notably, the Kirin 970 from HiSilicon significantly outperformed others due to its dedicated NPU, which supports float models, though lacking quantization support for optimal usage of all models.
  2. Influence of System-on-Chip Architectures:
    • Devices equipped with Qualcomm’s DSPs or MediaTek’s APU demonstrate potential efficiency gains, particularly with quantized models, showcasing innovative approaches in parallelizing AI tasks on mobile platforms. However, consistency in performance still varies, particularly due to the reliance on OEM integration and driver support.
  3. Framework Flexibility and Limitations:
    • TensorFlow Mobile and TensorFlow Lite are evaluated, elucidating the trade-offs between maturity and performance optimization, with TensorFlow Lite offering a route toward leveraging Android NNAPI for hardware-accelerated execution. Nonetheless, limitations in operation support pose challenges for intricate model deployment.

Implications and Future Directions

The findings implicitly challenge the current landscape to standardize hardware acceleration support across mobile AI platforms. The prospects of AI on mobile are vast and multifaceted. Future developments are likely to see an increase in dedicated AI cores, expanded operation support in frameworks like TensorFlow Lite, and refined quantization methods that balance model precision and inference speed. The burgeoning efforts in optimizing AI workloads highlight a continuing emphasis on making sophisticated AI technologies accessible on handheld devices, thereby revolutionizing the landscape of mobile computing. As such, continued collaboration across chipset vendors, software developers, and deep learning communities is vital in paving the way forward.

The AI Benchmark provides a valuable, quantitative bridge between the potential capabilities offered by sophisticated hardware and the complex requirements of modern AI applications within the pervasive environment of mobile computing.