Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Number Systems for Deep Neural Network Architectures: A Survey (2307.05035v1)

Published 11 Jul 2023 in cs.NE, cs.AR, and cs.LG

Abstract: Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted

Definition Search Book Streamline Icon: https://streamlinehq.com
References (168)
  1. L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams et al., “Recent advances in deep learning for speech research at microsoft,” in IEEE international conference on acoustics, speech and signal processing.   IEEE, 2013, pp. 8604–8608.
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
  3. Y. Guo, “A survey on methods and theories of quantized neural networks,” arXiv preprint arXiv:1808.04752, 2018.
  4. V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.
  5. A. Gupta, A. Anpalagan, L. Guan, and A. S. Khwaja, “Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues,” Array, vol. 10, p. 100057, 2021.
  6. A. Shewalkar, “Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU,” Journal of Artificial Intelligence and Soft Computing Research, vol. 9, no. 4, pp. 235–245, 2019.
  7. V. Buhrmester, D. Münch, and M. Arens, “Analysis of explainers of black box deep neural networks for computer vision: A survey,” Machine Learning and Knowledge Extraction, vol. 3, no. 4, pp. 966–989, 2021.
  8. D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE transactions on neural networks and learning systems, vol. 32, no. 2, pp. 604–624, 2020.
  9. I. V. Pustokhina, D. A. Pustokhin, D. Gupta, A. Khanna, K. Shankar, and G. N. Nguyen, “An effective training scheme for deep neural network in edge computing enabled internet of medical things (IoMT) systems,” IEEE Access, vol. 8, pp. 107 112–107 123, 2020.
  10. M. Alam, M. Samad, L. Vidyaratne, A. Glandon, and K. Iftekharuddin, “Survey on deep neural networks in speech and vision systems,” Neurocomputing, vol. 417, pp. 302–321, 2020.
  11. Y. LeCun, “Deep learning hardware: Past, present, and future,” in IEEE International Solid-State Circuits Conference-(ISSCC).   IEEE, 2019, pp. 12–19.
  12. A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, “A survey of quantization methods for efficient neural network inference,” arXiv preprint arXiv:2103.13630, 2021.
  13. C. Wu, V. Fresse, B. Suffran, and H. Konik, “Accelerating DNNs from local to virtualized FPGA in the cloud: A survey of trends,” Journal of Systems Architecture, vol. 119, p. 102257, 2021.
  14. D. Ghimire, D. Kil, and S.-h. Kim, “A survey on efficient convolutional neural networks and hardware acceleration,” Electronics, vol. 11, no. 6, p. 945, 2022.
  15. T. Choudhary, V. Mishra, A. Goswami, and J. Sarangapani, “A comprehensive survey on model compression and acceleration,” Artificial Intelligence Review, vol. 53, no. 7, pp. 5113–5155, 2020.
  16. V. Gohil, S. Walia, J. Mekie, and M. Awasthi, “Fixed-posit: a floating-point representation for error-resilient applications,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 10, pp. 3341–3345, 2021.
  17. B. Darvish Rouhani, D. Lo, R. Zhao, M. Liu, J. Fowers, K. Ovtcharov, A. Vinogradsky, S. Massengill, L. Yang, R. Bittner et al., “Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point,” Advances in Neural Information Processing Systems, vol. 33, pp. 10 271–10 281, 2020.
  18. V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks,” Synthesis Lectures on Computer Architecture, vol. 15, no. 2, pp. 1–341, 2020.
  19. S. Wang and P. Kanwar, “BFloat16: The secret to high performance on cloud TPUs,” Google Cloud Blog, vol. 30, 2019.
  20. J. Choquette, W. Gandhi, O. Giroux, N. Stam, and R. Krashinsky, “Nvidia A100 tensor core GPU: Performance and innovation,” IEEE Micro, vol. 41, no. 2, pp. 29–35, 2021.
  21. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in International conference on machine learning.   PMLR, 2015, pp. 1737–1746.
  22. D. Kalamkar, D. Mudigere, N. Mellempudi, D. Das, K. Banerjee, S. Avancha, D. T. Vooturi, N. Jammalamadaka, J. Huang, H. Yuen et al., “A study of BFLOAT16 for deep learning training,” arXiv preprint arXiv:1905.12322, 2019.
  23. U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Gray, S. Hall, L. Hornof et al., “Flexpoint: An adaptive numerical format for efficient training of deep neural networks,” Advances in neural information processing systems, vol. 30, 2017.
  24. V. Popescu, M. Nassar, X. Wang, E. Tumer, and T. Webb, “FlexPoint: Predictive numerics for deep learning,” in 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH).   IEEE, 2018, pp. 1–4.
  25. D. M. Harris and S. L. Harris, “Hardware description languages,” Digital Design and Computer Architecture, pp. 172–237, 2022.
  26. IEEE, “IEEE standard for floating-point arithmetic, IEEE std 754-2019 (revision of IEEE 754-2008).”   Institute of Electrical and Electronics Engineers New York, 2019.
  27. M. Courbariaux, Y. Bengio, and J.-P. David, “Training deep neural networks with low precision multiplications,” arXiv preprint arXiv:1412.7024, 2014.
  28. V. Leon, T. Paparouni, E. Petrongonas, D. Soudris, and K. Pekmestzi, “Improving power of DSP and CNN hardware accelerators using approximate floating-point multipliers,” ACM Transactions on Embedded Computing Systems (TECS), vol. 20, no. 5, pp. 1–21, 2021.
  29. H. Abdelaziz, J. H. Shin, A. Pedram, J. Hassoun et al., “Rethinking floating point overheads for mixed precision DNN accelerators,” Proceedings of Machine Learning and Systems, vol. 3, pp. 223–239, 2021.
  30. M. F. Hassan, K. F. Hussein, and B. Al-Musawi, “Design and implementation of fast floating point units for FPGAs,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 19, no. 3, pp. 1480–1489, 2020.
  31. C. Wu, M. Wang, X. Chu, K. Wang, and L. He, “Low-precision floating-point arithmetic for high-performance FPGA-based CNN acceleration,” ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 15, no. 1, pp. 1–21, 2021.
  32. H.-J. Kang, “Short floating-point representation for convolutional neural network inference,” IEICE Electronics Express, pp. 15–20 180 909, 2018.
  33. H. J. Lee, C. H. Kim, and S. W. Kim, “Design of floating-point MAC unit for computing DNN applications in PIM,” in International Conference on Electronics, Information, and Communication (ICEIC).   IEEE, 2020, pp. 1–7.
  34. S. Narang, G. Diamos, E. Elsen, P. Micikevicius, J. Alben, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh et al., “Mixed precision training,” in Proc. 6th Int. Conf. on Learning Representations (ICLR), 2018.
  35. C. Y. Lo and C.-W. Sham, “Energy efficient fixed-point inference system of convolutional neural network,” in IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS).   IEEE, 2020, pp. 403–406.
  36. S. Anwar, K. Hwang, and W. Sung, “Fixed point optimization of deep convolutional neural networks for object recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 1131–1135.
  37. D. Lin, S. Talathi, and S. Annapureddy, “Fixed point quantization of deep convolutional networks,” in International conference on machine learning.   PMLR, 2016, pp. 2849–2858.
  38. K. Hwang and W. Sung, “Fixed-point feedforward deep neural network design using weights +1, 0, and -1,” in 2014 IEEE Workshop on Signal Processing Systems (SiPS).   IEEE, 2014, pp. 1–6.
  39. S. Kim and H. Kim, “Zero-centered fixed-point quantization with iterative retraining for deep convolutional neural network-based object detectors,” IEEE Access, vol. 9, pp. 20 828–20 839, 2021.
  40. N. Mellempudi, A. Kundu, D. Das, D. Mudigere, and B. Kaul, “Mixed low-precision deep learning inference using dynamic fixed point,” arXiv preprint arXiv:1701.08978, 2017.
  41. Z. Wang, J. Lu, C. Tao, J. Zhou, and Q. Tian, “Learning channel-wise interactions for binary convolutional neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 568–577.
  42. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet classification using binary convolutional neural networks,” in European conference on computer vision.   Springer, 2016, pp. 525–542.
  43. M. Samragh, S. Hussain, X. Zhang, K. Huang, and F. Koushanfar, “On the application of binary neural networks in oblivious inference,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4630–4639.
  44. S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” IEEE Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, 2020.
  45. Z. Lin, M. Courbariaux, R. Memisevic, and Y. Bengio, “Neural networks with few multiplications,” arXiv preprint arXiv:1510.03009, 2015.
  46. H. Qin, R. Gong, X. Liu, X. Bai, J. Song, and N. Sebe, “Binary neural networks: A survey,” Pattern Recognition, vol. 105, p. 107281, 2020.
  47. M. G. Arnold, T. A. Bailey, J. J. Cupal, and M. D. Winkel, “On the cost effectiveness of logarithmic arithmetic for backpropagation training on simd processors,” in Proceedings of International Conference on Neural Networks (ICNN’97), vol. 2.   IEEE, 1997, pp. 933–936.
  48. B. Parhami, “Computing with logarithmic number system arithmetic: Implementation methods and performance benefits,” Computers & Electrical Engineering, vol. 87, p. 106800, 2020.
  49. D. Miyashita, E. H. Lee, and B. Murmann, “Convolutional neural networks using logarithmic data representation,” arXiv preprint arXiv:1603.01025, 2016.
  50. A. Sanyal, P. A. Beerel, and K. M. Chugg, “Neural network training with approximate logarithmic computations,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 3122–3126.
  51. S. A. Alam, J. Garland, and D. Gregg, “Low-precision logarithmic number systems: Beyond base-2,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 18, no. 4, pp. 1–25, 2021.
  52. I. Kouretas and V. Paliouras, “Logarithmic number system for deep learning,” in International Conference on Modern Circuits and Systems Technologies (MOCAST).   IEEE, 2018, pp. 1–4.
  53. H. Saadat, H. Bokhari, and S. Parameswaran, “Minimally biased multipliers for approximate integer and floating-point multiplication,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2623–2635, 2018.
  54. J. N. Mitchell, “Computer multiplication and division using binary logarithms,” IRE Transactions on Electronic Computers, no. 4, pp. 512–517, 1962.
  55. Z. Babić, A. Avramović, and P. Bulić, “An iterative logarithmic multiplier,” Microprocessors and Microsystems, vol. 35, no. 1, pp. 23–33, 2011.
  56. M. S. Ansari, B. F. Cockburn, and J. Han, “A hardware-efficient logarithmic multiplier with improved accuracy,” in Design, Automation & Test in Europe Conference & Exhibition (DATE).   IEEE, 2019, pp. 928–931.
  57. R. Pilipović and P. Bulić, “On the design of logarithmic multiplier using radix-4 Booth encoding,” IEEE access, vol. 8, pp. 64 578–64 590, 2020.
  58. L. Harsha, B. R. Jammu, N. Bodasingi, S. Veeramachaneni, and N. M. SK, “A low error, hardware efficient logarithmic multiplier,” Circuits, Systems, and Signal Processing, vol. 41, no. 1, pp. 485–513, 2022.
  59. M. S. Kim, A. A. Del Barrio, R. Hermida, and N. Bagherzadeh, “Low-power implementation of Mitchell’s approximate logarithmic multiplication for convolutional neural networks,” in Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2018, pp. 617–622.
  60. M. S. Kim, A. A. Del Barrio, L. T. Oliveira, R. Hermida, and N. Bagherzadeh, “Efficient Mitchell’s approximate log multipliers for convolutional neural networks,” IEEE Transactions on Computers, vol. 68, no. 5, pp. 660–675, 2018.
  61. S. S. Sarwar, S. Venkataramani, A. Raghunathan, and K. Roy, “Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing,” in Design, Automation & Test in Europe Conference & Exhibition (DATE).   IEEE, 2016, pp. 145–150.
  62. S. Hashemi, R. I. Bahar, and S. Reda, “DRUM: A dynamic range unbiased multiplier for approximate applications,” in IEEE/ACM International Conference on Computer-Aided Design (ICCAD).   IEEE, 2015, pp. 418–425.
  63. U. Lotrič and P. Bulić, “Logarithmic multiplier in hardware implementation of neural networks,” in International Conference on Adaptive and Natural Computing Algorithms.   Springer, 2011, pp. 158–168.
  64. H. Kim, M. S. Kim, A. A. Del Barrio, and N. Bagherzadeh, “A cost-efficient iterative truncated logarithmic multiplication for convolutional neural networks,” in 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH).   IEEE, 2019, pp. 108–111.
  65. M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and J. Han, “Improving the accuracy and hardware efficiency of neural networks using approximate multipliers,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 317–328, 2019.
  66. M. S. Ansari, B. F. Cockburn, and J. Han, “An improved logarithmic multiplier for energy-efficient neural computing,” IEEE Transactions on Computers, vol. 70, no. 4, pp. 614–625, 2020.
  67. J. Johnson, “Rethinking floating point for deep learning,” arXiv preprint arXiv:1811.01721, 2018.
  68. T.-B. Juang, C.-Y. Lin, and G.-Z. Lin, “Area-delay product efficient design for convolutional neural network circuits using logarithmic number systems,” in International SoC Design Conference (ISOCC).   IEEE, 2018, pp. 170–171.
  69. T.-B. Juang, H.-L. Kuo, and K.-S. Jan, “Lower-error and area-efficient antilogarithmic converters with bit-correction schemes,” Journal of the Chinese Institute of Engineers, vol. 39, no. 1, pp. 57–63, 2016.
  70. T.-B. Juang, P. K. Meher, and K.-S. Jan, “High-performance logarithmic converters using novel two-region bit-level manipulation schemes,” in Proceedings of 2011 International Symposium on VLSI Design, Automation and Test.   IEEE, 2011, pp. 1–4.
  71. J. Zhao, S. Dai, R. Venkatesan, M.-Y. Liu, B. Khailany, B. Dally, and A. Anandkumar, “Low-precision training in logarithmic number system using multiplicative weight update,” arXiv preprint arXiv:2106.13914, 2021.
  72. S. Vogel, M. Liang, A. Guntoro, W. Stechele, and G. Ascheid, “Efficient hardware acceleration of CNNs using logarithmic data representation with arbitrary log-base,” in Proceedings of the International Conference on Computer-Aided Design, 2018, pp. 1–8.
  73. T.-Y. Lu, H.-H. Chin, H.-I. Wu, and R.-S. Tsay, “A very compact embedded CNN processor design based on logarithmic computing,” arXiv preprint arXiv:2010.11686, 2020.
  74. E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “LogNet: Energy-efficient neural networks using logarithmic computation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2017, pp. 5900–5904.
  75. J. Xu, Y. Huan, L.-R. Zheng, and Z. Zou, “A low-power arithmetic element for multi-base logarithmic computation on deep neural networks,” in IEEE International System-on-Chip Conference (SOCC).   IEEE, 2018, pp. 43–48.
  76. J. Xu, Y. Huan, Y. Jin, H. Chu, L.-R. Zheng, and Z. Zou, “Base-reconfigurable segmented logarithmic quantization and hardware design for deep neural networks,” Journal of Signal Processing Systems, vol. 92, no. 11, pp. 1263–1276, 2020.
  77. T. Ueki, K. Iwai, T. Matsubara, and T. Kurokawa, “Learning accelerator of deep neural networks with logarithmic quantization,” in 2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI).   IEEE, 2018, pp. 634–638.
  78. H. Vergos, C. Efstathiou, and D. Nikolos, “Diminished-one modulo 2n+1superscript2𝑛12^{n}+12 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT + 1 adder design,” IEEE Transactions on Computers - TC, vol. 51, pp. 1389–1399, 01 2002.
  79. V. Paliouras, K. Karagianni, and T. Stouraitis, “A low-complexity combinatorial RNS multiplier,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 48, pp. 675 – 683, 08 2001.
  80. Y. Kong, S. Asif, and M. Khan, “Modular multiplication using the core function in the residue number system,” Applicable Algebra in Engineering, Communication and Computing, pp. 1–16, 2016.
  81. C. Efstathiou, H. Vergos, G. Dimitrakopoulos, and D. Nikolos, “Efficient diminished-1 modulo 2n+1superscript2𝑛12^{n}+12 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT + 1 multipliers,” IEEE Transactions on Computers - TC, vol. 54, pp. 491–496, 04 2005.
  82. K. Gbolagade and S. Cotofana, “An O(n) residue number system to mixed radix conversion technique,” 05 2009, pp. 521–524.
  83. Z. Torabi and G. Jaberipur, “Low-power/cost RNS comparison via partitioning the dynamic range,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 5, pp. 1849–1857, 2016.
  84. H. Xiao, Y. Ye, G. Xiao, and Q. Kang, “Algorithms for comparison in residue number systems,” in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016, pp. 1–6.
  85. M. Xu, Z. Bian, and R. Yao, “Fast sign detection algorithm for the RNS moduli set {2n+1−1,2n−1,2n}superscript2𝑛11superscript2𝑛1superscript2𝑛\{2^{n+1}-1,2^{n}-1,2^{n}\}{ 2 start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT - 1 , 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - 1 , 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT },” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 2, pp. 379–383, 2015.
  86. A. Meyer-Base and T. Stouraitis, “New power-of-2 RNS scaling scheme for cell-based IC design,” IEEE Trans. VLSI Syst., vol. 11, pp. 280–283, 04 2003.
  87. Y. Kong and B. Phillips, “Fast scaling in the residue number system,” IEEE Transactions on VLSI Systems, vol. 17, pp. 443–447, 03 2009.
  88. E. Olsen, “RNS hardware matrix multiplier for high precision neural network acceleration: RNS TPU,” May 2018, pp. 1–5.
  89. H. Nakahara and T. Sasao, “A deep convolutional neural network based on nested residue number system,” in 2015 25th International Conference on Field Programmable Logic and Applications (FPL), 2015, pp. 1–6.
  90. M. Valueva, N. Nagornov, P. Lyakhov, G. Valuev, and N. Chervyakov, “Application of the residue number system to reduce hardware costs of the convolutional neural network implementation,” Mathematics and Computers in Simulation, vol. 177, pp. 232–243, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378475420301580
  91. M. Abdelhamid and S. Koppula, “Applying the residue number system to network inference,” arXiv preprint arXiv:1712.04614, 2017.
  92. S. Salamat, S. Shubhi, B. Khaleghi, and T. Rosing, “Residue-Net: Multiplication-free neural network by in-situ no-loss migration to residue number systems,” in 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC), 2021, pp. 222–228.
  93. V. Sakellariou, V. Paliouras, I. Kouretas, H. Saleh, and T. Stouraitis, “On reducing the number of multiplications in RNS-based CNN accelerators,” in IEEE International Conference on Electronics, Circuits, and Systems (ICECS), 2021, pp. 1–6.
  94. N. Samimi, M. Kamal, A. Afzali-Kusha, and M. Pedram, “Res-DNN: A residue number system-based DNN accelerator unit,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 2, pp. 658–671, 2020.
  95. Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” vol. 44, Jun. 2016.
  96. S. Salamat, M. Imani, S. Gupta, and T. Rosing, “RNSnet: In-memory neural network acceleration using residue number system,” 11 2018, pp. 1–12.
  97. A. Roohi, M. Taheri, S. Angizi, and D. Fan, “RNSiM: Efficient deep neural network accelerator using residue number systems,” in IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2021, pp. 1–9.
  98. V. Sakellariou, V. Paliouras, I. Kouretas, H. Saleh, and T. Stouraitis, “A High-performance RNS LSTM block,” in IEEE International Symposium on Circuits and Systems (ISCAS), May 28-Jun. 1 2022.
  99. X. Lian, Z. Liu, Z. Song, J. Dai, W. Zhou, and X. Ji, “High-performance FPGA-based CNN accelerator with block-floating-point arithmetic,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 8, pp. 1874–1885, 2019.
  100. Z. Song, Z. Liu, and D. Wang, “Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  101. G. Yang, T. Zhang, P. Kirichenko, J. Bai, A. G. Wilson, and C. De Sa, “SWALP: Stochastic weight averaging in low precision training,” in International Conference on Machine Learning.   PMLR, 2019, pp. 7015–7024.
  102. H. Fan, G. Wang, M. Ferianc, X. Niu, and W. Luk, “Static block floating-point quantization for convolutional neural networks on FPGA,” in International Conference on Field-Programmable Technology (ICFPT).   IEEE, 2019, pp. 28–35.
  103. C. Ni, J. Lu, J. Lin, and Z. Wang, “LBFP: Logarithmic block floating point arithmetic for deep neural networks,” in IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).   IEEE, 2020, pp. 201–204.
  104. H. Zhang, Z. Liu, G. Zhang, J. Dai, X. Lian, W. Zhou, and X. Ji, “A block-floating-point arithmetic based FPGA accelerator for convolutional neural networks,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP).   IEEE, 2019, pp. 1–5.
  105. H. Fan, H.-C. Ng, S. Liu, Z. Que, X. Niu, and W. Luk, “Reconfigurable acceleration of 3D-CNNs for human action recognition with block floating-point representation,” in International Conference on Field Programmable Logic and Applications (FPL).   IEEE, 2018, pp. 287–2877.
  106. S. Q. Zhang, B. McDanel, and H. Kung, “FAST: DNN training under variable precision block floating point with stochastic rounding,” arXiv preprint arXiv:2110.15456, 2021.
  107. M. Drumond, T. Lin, M. Jaggi, and B. Falsafi, “Training DNNs with hybrid block floating point,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  108. Y. Wong, Z. Dong, and W. Zhang, “Low bitwidth CNN accelerator on FPGA using winograd and block floating point arithmetic,” in 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).   IEEE, 2021, pp. 218–223.
  109. U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. R. Chiu, “An OpenCl™ deep learning accelerator on Arria 10,” in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 55–64.
  110. S. Jo, H. Park, G. Lee, and K. Choi, “Training neural networks with low precision dynamic fixed-point,” in 2018 IEEE 36th International Conference on Computer Design (ICCD).   IEEE, 2018, pp. 405–408.
  111. D. Das, N. Mellempudi, D. Mudigere, D. Kalamkar, S. Avancha, K. Banerjee, S. Sridharan, K. Vaidyanathan, B. Kaul, E. Georganas et al., “Mixed precision training of convolutional neural networks using integer operations,” arXiv preprint arXiv:1802.00930, 2018.
  112. J.-D. Su and P.-Y. Tsai, “Processing element architecture design for deep reinforcement learning with flexible block floating point exploiting signal statistics,” in 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).   IEEE, 2020, pp. 82–87.
  113. H. Fan, S. Liu, Z. Que, X. Niu, and W. Luk, “High-performance acceleration of 2-D and 3-D CNNs on FPGAs using static block floating point,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  114. S. Claici, M. Yurochkin, S. Ghosh, and J. Solomon, “Model fusion with kullback-leibler divergence,” in International Conference on Machine Learning.   PMLR, 2020, pp. 2038–2047.
  115. S.-H. Noh, J. Koo, S. Lee, J. Park, and J. Kung, “FlexBlock: A flexible DNN training accelerator with multi-mode block floating point support,” arXiv preprint arXiv:2203.06673, 2022.
  116. S. Fox, J. Faraone, D. Boland, K. Vissers, and P. H. Leong, “Training deep neural networks in low-precision with high accuracy using FPGAs,” in International Conference on Field-Programmable Technology (ICFPT).   IEEE, 2019, pp. 1–9.
  117. S. Fox, S. Rasoulinezhad, J. Faraone, P. Leong et al., “A block Minifloat representation for training deep neural networks,” in International Conference on Learning Representations, 2020.
  118. C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan, and J. Cong, “Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 11, pp. 2072–2085, 2018.
  119. C. Mei, Z. Liu, Y. Niu, X. Ji, W. Zhou, and D. Wang, “A 200MHZ [email protected] VGG16 accelerator in Xilinx VX690T,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP).   IEEE, 2017, pp. 784–788.
  120. Y. Sakai, “Quantizaiton for deep neural network training with 8-bit dynamic fixed point,” in 2020 7th International Conference on Soft Computing & Machine Intelligence (ISCMI).   IEEE, 2020, pp. 126–130.
  121. Y.-C. Wu and C.-T. Huang, “Efficient dynamic fixed-point quantization of CNN inference accelerators for edge devices,” in 2019 International Symposium on VLSI Design, Automation and Test (VLSI-DAT).   IEEE, 2019, pp. 1–4.
  122. M. de Prado, M. Denna, L. Benini, and N. Pazos, “QUENN: Quantization engine for low-power neural networks,” in Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018, pp. 36–44.
  123. W.-H. Lin, H.-Y. Kao, and S.-H. Huang, “A design framework for hardware approximation of deep neural networks,” in International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).   IEEE, 2019, pp. 1–2.
  124. J.-I. Guo, C.-C. Tsai, J.-L. Zeng, S.-W. Peng, and E.-C. Chang, “Hybrid fixed-point/binary deep neural network design methodology for low-power object detection,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 10, no. 3, pp. 388–400, 2020.
  125. P. Gysel, M. Motamedi, and S. Ghiasi, “Hardware-oriented approximation of convolutional neural networks,” arXiv preprint arXiv:1604.03168, 2016.
  126. T. Na and S. Mukhopadhyay, “Speeding up convolutional neural network training with dynamic precision scaling and flexible multiplier-accumulator,” in Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016, pp. 58–63.
  127. L. Shan, M. Zhang, L. Deng, and G. Gong, “A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network,” in CCF National Conference on Computer Engineering and Technology.   Springer, 2016, pp. 102–111.
  128. P. Peng, Y. Mingyu, and X. Weisheng, “Running 8-bit dynamic fixed-point convolutional neural network on low-cost ARM platforms,” in 2017 Chinese Automation Congress (CAC).   IEEE, 2017, pp. 4564–4568.
  129. L. Lai, N. Suda, and V. Chandra, “Deep convolutional neural network inference with floating-point weights and fixed-point activations,” arXiv preprint arXiv:1703.03073, 2017.
  130. D. Shin, J. Lee, J. Lee, J. Lee, and H.-J. Yoo, “An energy-efficient deep learning processor with heterogeneous multi-core architecture for convolutional neural networks and recurrent neural networks,” in 2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS).   IEEE, 2017, pp. 1–2.
  131. D. Han, J. Lee, J. Lee, and H.-J. Yoo, “A low-power deep neural network online learning processor for real-time object tracking application,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 5, pp. 1794–1804, 2018.
  132. A. Shawahna, S. M. Sait, A. El-Maleh, and I. Ahmad, “FxP-QNet: A post-training quantizer for the design of mixed low-precision DNNs with dynamic fixed-point representation,” IEEE Access, vol. 10, pp. 30 202–30 231, 2022.
  133. W.-H. Lin, H.-Y. Kao, and S.-H. Huang, “Hybrid dynamic fixed point quantization methodology for AI accelerators,” in International SoC Design Conference (ISOCC).   IEEE, 2021, pp. 282–283.
  134. R. Kuramochi and H. Nakahara, “An FPGA-based low-latency accelerator for randomly wired neural networks,” in International Conference on Field-Programmable Logic and Applications (FPL).   IEEE, 2020, pp. 298–303.
  135. R. N. Prieto, “Implementation of an 8-bit dynamic fixed-point convolutional neural network for human sign language recognition on a Xilinx FPGA board,” 2019.
  136. R. Ding, G. Su, G. Bai, W. Xu, N. Su, and X. Wu, “A FPGA-based accelerator of convolutional neural network for face feature extraction,” in IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC).   IEEE, 2019, pp. 1–3.
  137. N. Mitschke, M. Heizmann, K.-H. Noffz, and R. Wittmann, “A fixed-point quantization technique for convolutional neural networks based on weight scaling,” in IEEE International Conference on Image Processing (ICIP).   IEEE, 2019, pp. 3836–3840.
  138. T. Na, J. H. Ko, J. Kung, and S. Mukhopadhyay, “On-chip training of recurrent neural networks with limited numerical precision,” in International Joint Conference on Neural Networks (IJCNN).   IEEE, 2017, pp. 3716–3723.
  139. I. Taras and D. M. Stuart, “Quantization error as a metric for dynamic precision scaling in neural net training,” arXiv preprint arXiv:1801.08621, 2018.
  140. D. Han, D. Im, G. Park, Y. Kim, S. Song, J. Lee, and H.-J. Yoo, “HNPU: An adaptive DNN training processor utilizing stochastic dynamic fixed-point and active bit-precision searching,” IEEE Journal of Solid-State Circuits, vol. 56, no. 9, pp. 2858–2869, 2021.
  141. C. Y. Lo, F. C. Lau, and C.-W. Sham, “Fixed-point implementation of convolutional neural networks for image classification,” in 2018 International Conference on Advanced Technologies for Communications (ATC).   IEEE, 2018, pp. 105–109.
  142. J. Yang, S. Hong, and J.-Y. Kim, “FIXAR: A fixed-point deep reinforcement learning platform with quantization-aware training and adaptive parallelism,” in 2021 58th ACM/IEEE Design Automation Conference (DAC).   IEEE, 2021, pp. 259–264.
  143. Y. Kang, J.-S. Yang, and J. Chung, “Weight partitioning for dynamic fixed-point neuromorphic computing systems,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 11, pp. 2167–2171, 2018.
  144. J. Lu, C. Fang, M. Xu, J. Lin, and Z. Wang, “Evaluations on deep neural networks training using posit number system,” IEEE Transactions on Computers, vol. 70, no. 2, pp. 174–187, 2020.
  145. Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson, and D. Kudithipudi, “Deep Positron: A deep neural network using the posit number system,” in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).   IEEE, 2019, pp. 1421–1426.
  146. J. L. Gustafson and I. T. Yonemoto, “Beating floating point at its own game: Posit arithmetic,” Supercomputing Frontiers and Innovations, vol. 4, no. 2, pp. 71–86, 2017.
  147. M. Cococcioni, F. Rossi, E. Ruffaldi, and S. Saponara, “A fast approximation of the hyperbolic tangent when using posit numbers and its application to deep neural networks,” in International Conference on Applications in Electronics Pervading Industry, Environment and Society.   Springer, 2019, pp. 213–221.
  148. A. Y. Romanov, A. L. Stempkovsky, I. V. Lariushkin, G. E. Novoselov, R. A. Solovyev, V. A. Starykh, I. I. Romanova, D. V. Telpukhov, and I. A. Mkrtchan, “Analysis of posit and Bfloat arithmetic of real numbers for machine learning,” IEEE Access, vol. 9, pp. 82 318–82 324, 2021.
  149. H. F. Langroudi, V. Karia, J. L. Gustafson, and D. Kudithipudi, “Adaptive posit: Parameter aware numerical format for deep learning inference on the edge,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 726–727.
  150. S. H. F. Langroudi, T. Pandit, and D. Kudithipudi, “Deep learning inference on embedded devices: Fixed-point vs posit,” in Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).   IEEE, 2018, pp. 19–23.
  151. R. M. Montero, A. A. Del Barrio, and G. Botella, “Template-based posit multiplication for training and inferring in neural networks,” arXiv preprint arXiv:1907.04091, 2019.
  152. Z. Carmichael, H. F. Langroudi, C. Khazanov, J. Lillie, J. L. Gustafson, and D. Kudithipudi, “Performance-efficiency trade-off of low-precision numerical formats in deep neural networks,” in Proceedings of the conference for next generation arithmetic 2019, 2019, pp. 1–9.
  153. H. F. Langroudi, V. Karia, Z. Carmichael, A. Zyarah, T. Pandit, J. L. Gustafson, and D. Kudithipudi, “ALPS: Adaptive quantization of deep neural networks with generalized posits,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3100–3109.
  154. Z. Wan, E. Mibuari, E.-Y. Yang, and T. Tambe, “Study of posit numeric in speech recognition neural inference,” Harvard Univ., Cambridge, MA, USA, Tech. Rep. CS247r, 2018.
  155. R. Murillo, A. A. Del Barrio, and G. Botella, “Customized posit adders and multipliers using the FloPoCo core generator,” in IEEE International Symposium on Circuits and Systems (ISCAS).   IEEE, 2020, pp. 1–5.
  156. R. Murillo, A. A. D. B. Garcia, G. Botella, M. S. Kim, H. Kim, and N. Bagherzadeh, “PLAM: a posit logarithm-approximate multiplier,” IEEE Transactions on Emerging Topics in Computing, 2021.
  157. S. Nambi, S. Ullah, S. S. Sahoo, A. Lohana, F. Merchant, and A. Kumar, “ExPAN(N)D: Exploring posits for efficient artificial neural network design in FPGA-based systems,” IEEE Access, vol. 9, pp. 103 691–103 708, 2021.
  158. R. Murillo, A. A. Del Barrio, and G. Botella, “Deep PeNSieve: A deep learning framework based on the posit number system,” Digital Signal Processing, vol. 102, p. 102762, 2020.
  159. H. Zhang, J. He, and S.-B. Ko, “Efficient posit multiply-accumulate unit generator for deep learning applications,” in IEEE International Symposium on Circuits and Systems (ISCAS).   IEEE, 2019, pp. 1–5.
  160. H. F. Langroudi, Z. Carmichael, J. L. Gustafson, and D. Kudithipudi, “PositNN framework: Tapered precision deep learning inference for the edge,” in 2019 IEEE Space Computing Conference (SCC).   IEEE, 2019, pp. 53–59.
  161. A. Podobas and S. Matsuoka, “Hardware implementation of posits and their application in FPGAs,” in IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).   IEEE, 2018, pp. 138–145.
  162. S. Walia, B. V. Tej, A. Kabra, J. Devnath, and J. Mekie, “Fast and low-power quantized fixed posit high-accuracy DNN implementation,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2021.
  163. M. Cococcioni, F. Rossi, E. Ruffaldi, and S. Saponara, “Fast deep neural networks for image processing using posits and ARM scalable vector extension,” Journal of Real-Time Image Processing, vol. 17, no. 3, pp. 759–771, 2020.
  164. ——, “A novel posit-based fast approximation of ELU activation function for deep neural networks,” in International Conference on Smart Computing (SMARTCOMP), 2020, pp. 244–246.
  165. X. Zhang, S. Liu, R. Zhang, C. Liu, D. Huang, S. Zhou, J. Guo, Q. Guo, Z. Du, T. Zhi et al., “Fixed-point back-propagation training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2330–2338.
  166. M. G. Arnold, V. Paliouras, and I. Kouretas, “Implementing the residue logarithmic number system using interpolation and cotransformation,” IEEE Transactions on Computers, vol. 69, no. 12, pp. 1719–1732, 2019.
  167. D. D. Lin and S. S. Talathi, “Overcoming challenges in fixed point training of deep convolutional networks,” arXiv preprint arXiv:1607.02241, 2016.
  168. X. Chen, X. Hu, H. Zhou, and N. Xu, “FxpNet: Training a deep convolutional neural network in fixed-point representation,” in International Joint Conference on Neural Networks (IJCNN).   IEEE, 2017, pp. 2494–2501.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ghada Alsuhli (3 papers)
  2. Vasileios Sakellariou (1 paper)
  3. Hani Saleh (10 papers)
  4. Mahmoud Al-Qutayri (19 papers)
  5. Baker Mohammad (10 papers)
  6. Thanos Stouraitis (4 papers)
Citations (3)