Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Architecture Aware Latency Constrained Sparse Neural Networks (2109.00170v1)

Published 1 Sep 2021 in cs.CV

Abstract: Acceleration of deep neural networks to meet a specific latency constraint is essential for their deployment on mobile devices. In this paper, we design an architecture aware latency constrained sparse (ALCS) framework to prune and accelerate CNN models. Taking modern mobile computation architectures into consideration, we propose Single Instruction Multiple Data (SIMD)-structured pruning, along with a novel sparse convolution algorithm for efficient computation. Besides, we propose to estimate the run time of sparse models with piece-wise linear interpolation. The whole latency constrained pruning task is formulated as a constrained optimization problem that can be efficiently solved with Alternating Direction Method of Multipliers (ADMM). Extensive experiments show that our system-algorithm co-design framework can achieve much better Pareto frontier among network accuracy and latency on resource-constrained mobile devices.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Tianli Zhao (5 papers)
  2. Qinghao Hu (31 papers)
  3. Xiangyu He (19 papers)
  4. Weixiang Xu (9 papers)
  5. Jiaxing Wang (16 papers)
  6. Cong Leng (13 papers)
  7. Jian Cheng (127 papers)