Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference (2403.05465v2)

Published 8 Mar 2024 in cs.AR, cs.AI, cs.LG, and cs.NE

Abstract: Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while reducing representational divergence between quantized and full-precision models through a novel global-local contrastive objective. Additionally, we design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath. Our algorithm-hardware co-design demonstrates on average <1% drop in top-1 accuracy across various CNN and ViT models. It also achieves ~ 2x improvements in performance per unit area and 2.2x gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. S.A. Alam et al. 2021. Low-precision logarithmic number systems: beyond base-2. ACM TACO 18, 4 (2021), 1–25.
  2. Y. Cai et al. 2020. Zeroq A novel zero shot quantization framework. In CVPR.
  3. Lawrence T DeCarlo. 1997. On the meaning and use of kurtosis. Psychological methods 2, 3 (1997), 292.
  4. P. Dong et al. 2023. EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization. In CVPR. 17076–17086.
  5. P. Fradkin et al. 2022. Robustness to Adversarial Gradients: A Glimpse Into the Loss Landscape of Contrastive Pre-training. In Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022.
  6. N. Frumkin et al. 2023. Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers. In CVPR. 16978–16988.
  7. C. Guo et al. 2022. Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization. In MICRO. IEEE, 1414–1433.
  8. J. Gustafson and I. Yonemoto. 2017. Beating floating point at its own game: Posit arithmetic. Supercomputing frontiers and innovations 4, 2 (2017), 71–86.
  9. B. Keller et al. 2023. A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm. IJSSC 58, 4 (2023), 1129–1141.
  10. H. Langroudi et al. 2019a. Cheetah: Mixed low-precision hardware & software co-design framework for DNNs on the edge. arXiv:1908.02386 (2019).
  11. H.F. Langroudi et al. 2019b. Positnn framework: Tapered precision deep learning inference for the edge. In 2019 (SCC). IEEE, 53–59.
  12. Y. Li et al. 2021. Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv:2102.05426 (2021).
  13. Y. Lin et al. 2021. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv:2111.13824 (2021).
  14. F. Liu et al. 2021. Improving neural network efficiency via post-training quantization with adaptive floating-point. In CVPR. 5281–5290.
  15. R. Murillo et al. 2020. Deep PeNSieve: A deep learning framework based on the posit number system. DSP 102 (2020), 102762.
  16. A. Ramachandran et al. 2022. PositIV: A Configurable Posit Processor Architecture for Image and Video Processing. In 2022 25th Euromicro DSD.
  17. S. Sarangi et al. 2021. DeepScaleTool: A tool for the accurate estimation of technology scaling in the deep-submicron era. In ISCAS. IEEE, 1–5.
  18. H. Sharma et al. 2016. From high-level deep neural models to FPGAs. In MICRO.
  19. H. Sharma et al. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. In ISCA. IEEE, 764–775.
  20. T. Tambe et al. 2020. Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference. In DAC. IEEE, 1–6.
  21. Z. Yao et al. 2021. Hawq-v3: Dyadic neural network quantization. In ICML,PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Akshat Ramachandran (7 papers)
  2. Zishen Wan (33 papers)
  3. Geonhwa Jeong (12 papers)
  4. John Gustafson (5 papers)
  5. Tushar Krishna (87 papers)
Citations (8)