Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation (2005.03403v2)

Published 7 May 2020 in cs.LG and stat.ML

Abstract: We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine DNN models and four datasets; and 2) on the hardware level, the proposed SmartExchange based accelerator can improve the energy efficiency by up to 6.7$\times$ and the speedup by up to 19.2$\times$ over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yang Zhao (382 papers)
  2. Xiaohan Chen (30 papers)
  3. Yue Wang (676 papers)
  4. Chaojian Li (34 papers)
  5. Haoran You (33 papers)
  6. Yonggan Fu (49 papers)
  7. Yuan Xie (188 papers)
  8. Zhangyang Wang (375 papers)
  9. Yingyan Lin (67 papers)
Citations (43)

Summary

We haven't generated a summary for this paper yet.