Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes (2206.08852v2)

Published 17 Jun 2022 in cs.LG

Abstract: Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Matteo Risso (20 papers)
  2. Alessio Burrello (52 papers)
  3. Luca Benini (362 papers)
  4. Enrico Macii (37 papers)
  5. Massimo Poncino (34 papers)
  6. Daniele Jahier Pagliari (46 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.