Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator (2309.06019v2)

Published 12 Sep 2023 in cs.AR, cs.AI, and cs.PF

Abstract: We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based processing technique called DSLOT-NN with aim to accelerate inference of the convolution operation in the deep neural networks (DNNs). The proposed work has the ability to assess and terminate the ineffective convolutions which results in massive power and energy savings. The processing engine is comprised of low-latency most-significant-digit-first (MSDF) (also called online) multipliers and adders that processes data from left-to-right, allowing the execution of subsequent operations in digit-pipelined manner. Use of online operators eliminates the need for the development of complex mechanism of identifying the negative activation, as the output with highest weight value is generated first, and the sign of the result can be identified as soon as first non-zero digit is generated. The precision of the online operators can be tuned at run-time, making them extremely useful in situations where accuracy can be compromised for power and energy savings. The proposed design has been implemented on Xilinx Virtex-7 FPGA and is compared with state-of-the-art Stripes on various performance metrics. The results show the proposed design presents power savings, has shorter cycle time, and approximately 50% higher OPS per watt.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. G. Gao, Z. Xu, J. Li, J. Yang, T. Zeng, and G.-J. Qi, “Ctcnet: A cnn-transformer cooperation network for face image super-resolution,” IEEE Transactions on Image Processing, vol. 32, pp. 1978–1991, 2023.
  2. M. Usman, S. Khan, and J.-A. Lee, “Afp-lse: Antifreeze proteins prediction using latent space encoding of composition of k-spaced amino acid pairs,” Scientific Reports, vol. 10, no. 1, p. 7197, 2020.
  3. M. Usman, S. Khan, S. Park, and J.-A. Lee, “Aop-lse: Antioxidant proteins classification using deep latent space encoding of sequence features,” Current Issues in Molecular Biology, vol. 43, no. 3, pp. 1489–1501, 2021.
  4. H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna, “Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 754–768.
  5. S. Jain, S. Venkataramani, V. Srinivasan, J. Choi, P. Chuang, and L. Chang, “Compensated-dnn: Energy efficient low-precision deep neural networks by compensating quantization errors,” in Proceedings of the 55th Annual Design Automation Conference, 2018, pp. 1–6.
  6. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.   Ieee, 2009, pp. 248–255.
  7. N. P. Jouppi, C. Young, N. Patil, and D. Patterson, “A domain-specific architecture for deep neural networks,” Communications of the ACM, vol. 61, no. 9, pp. 50–59, 2018.
  8. L. R. Juracy, R. Garibotti, F. G. Moraes et al., “From cnn to dnn hardware accelerators: A survey on design, exploration, simulation, and frameworks,” Foundations and Trends® in Electronic Design Automation, vol. 13, no. 4, pp. 270–344, 2023.
  9. P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos, “Stripes: Bit-serial deep neural network computing,” in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).   IEEE, 2016, pp. 1–12.
  10. J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H.-J. Yoo, “Unpu: An energy-efficient deep neural network accelerator with fully variable weight bit precision,” IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 173–185, 2018.
  11. V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, R. K. Gupta, and H. Esmaeilzadeh, “Snapea: Predictive early activation for reducing computation in deep convolutional neural networks,” in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).   IEEE, 2018, pp. 662–673.
  12. D. Lee, S. Kang, and K. Choi, “Compend: Computation pruning through early negative detection for relu in a deep neural network accelerator,” in Proceedings of the 2018 International Conference on Supercomputing, 2018, pp. 139–148.
  13. N. Kim, H. Park, D. Lee, S. Kang, J. Lee, and K. Choi, “Compreend: Computation pruning through predictive early negative detection for relu in a deep neural network accelerator,” IEEE Transactions on Computers, 2021.
  14. M. D. Ercegovac, “On-Line Arithmetic: An Overview,” in Real-Time Signal Processing VII, K. Bromley, Ed., vol. 0495, International Society for Optics and Photonics.   SPIE, 1984, pp. 86 – 93. [Online]. Available: https://doi.org/10.1117/12.944012
  15. M. Usman, M. D. Ercegovac, and J.-A. Lee, “Low-latency online multiplier with reduced activities and minimized interconnect for inner product arrays,” Journal of Signal Processing Systems, pp. 1–20, 2023.
  16. X. Chen, J. Zhu, J. Jiang, and C.-Y. Tsui, “Comprrae: Rram-based convolutional neural network accelerator with r educed computations through ar untime a ctivation e stimation,” in Proceedings of the 24th Asia and South Pacific design automation conference, 2019, pp. 133–139.
  17. L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
Citations (2)

Summary

We haven't generated a summary for this paper yet.