Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization (2004.11250v1)

Published 22 Apr 2020 in cs.LG, cs.CV, and cs.MM

Abstract: High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices. This demo shows that these optimizations can enable real-time mobile execution of multiple DNN applications, including style transfer, DNN coloring and super resolution.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Wei Niu (68 papers)
  2. Pu Zhao (82 papers)
  3. Zheng Zhan (27 papers)
  4. Xue Lin (92 papers)
  5. Yanzhi Wang (197 papers)
  6. Bin Ren (136 papers)
Citations (5)