Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference (2403.05465v2)

Published 8 Mar 2024 in cs.AR, cs.AI, cs.LG, and cs.NE

Abstract: Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while reducing representational divergence between quantized and full-precision models through a novel global-local contrastive objective. Additionally, we design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath. Our algorithm-hardware co-design demonstrates on average <1% drop in top-1 accuracy across various CNN and ViT models. It also achieves ~ 2x improvements in performance per unit area and 2.2x gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. S.A. Alam et al. 2021. Low-precision logarithmic number systems: beyond base-2. ACM TACO 18, 4 (2021), 1–25.
  2. Y. Cai et al. 2020. Zeroq A novel zero shot quantization framework. In CVPR.
  3. Lawrence T DeCarlo. 1997. On the meaning and use of kurtosis. Psychological methods 2, 3 (1997), 292.
  4. P. Dong et al. 2023. EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization. In CVPR. 17076–17086.
  5. P. Fradkin et al. 2022. Robustness to Adversarial Gradients: A Glimpse Into the Loss Landscape of Contrastive Pre-training. In Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022.
  6. N. Frumkin et al. 2023. Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers. In CVPR. 16978–16988.
  7. C. Guo et al. 2022. Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization. In MICRO. IEEE, 1414–1433.
  8. J. Gustafson and I. Yonemoto. 2017. Beating floating point at its own game: Posit arithmetic. Supercomputing frontiers and innovations 4, 2 (2017), 71–86.
  9. B. Keller et al. 2023. A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm. IJSSC 58, 4 (2023), 1129–1141.
  10. H. Langroudi et al. 2019a. Cheetah: Mixed low-precision hardware & software co-design framework for DNNs on the edge. arXiv:1908.02386 (2019).
  11. H.F. Langroudi et al. 2019b. Positnn framework: Tapered precision deep learning inference for the edge. In 2019 (SCC). IEEE, 53–59.
  12. Y. Li et al. 2021. Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv:2102.05426 (2021).
  13. Y. Lin et al. 2021. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv:2111.13824 (2021).
  14. F. Liu et al. 2021. Improving neural network efficiency via post-training quantization with adaptive floating-point. In CVPR. 5281–5290.
  15. R. Murillo et al. 2020. Deep PeNSieve: A deep learning framework based on the posit number system. DSP 102 (2020), 102762.
  16. A. Ramachandran et al. 2022. PositIV: A Configurable Posit Processor Architecture for Image and Video Processing. In 2022 25th Euromicro DSD.
  17. S. Sarangi et al. 2021. DeepScaleTool: A tool for the accurate estimation of technology scaling in the deep-submicron era. In ISCAS. IEEE, 1–5.
  18. H. Sharma et al. 2016. From high-level deep neural models to FPGAs. In MICRO.
  19. H. Sharma et al. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. In ISCA. IEEE, 764–775.
  20. T. Tambe et al. 2020. Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference. In DAC. IEEE, 1–6.
  21. Z. Yao et al. 2021. Hawq-v3: Dyadic neural network quantization. In ICML,PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Akshat Ramachandran (7 papers)
  2. Zishen Wan (33 papers)
  3. Geonhwa Jeong (12 papers)
  4. John Gustafson (5 papers)
  5. Tushar Krishna (87 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.