Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NASH: Neural Architecture Search for Hardware-Optimized Machine Learning Models (2403.01845v2)

Published 4 Mar 2024 in cs.LG, cs.AI, and cs.CV

Abstract: As ML algorithms get deployed in an ever-increasing number of applications, these algorithms need to achieve better trade-offs between high accuracy, high throughput and low latency. This paper introduces NASH, a novel approach that applies neural architecture search to machine learning hardware. Using NASH, hardware designs can achieve not only high throughput and low latency but also superior accuracy performance. We present four versions of the NASH strategy in this paper, all of which show higher accuracy than the original models. The strategy can be applied to various convolutional neural networks, selecting specific model operations among many to guide the training process toward higher accuracy. Experimental results show that applying NASH on ResNet18 or ResNet34 achieves a top 1 accuracy increase of up to 3.1% and a top 5 accuracy increase of up to 2.2% compared to the non-NASH version when tested on the ImageNet data set. We also integrated this approach into the FINN hardware model synthesis tool to automate the application of our approach and the generation of the hardware model. Results show that using FINN can achieve a maximum throughput of 324.5 fps. In addition, NASH models can also result in a better trade-off between accuracy and hardware resource utilization. The accuracy-hardware (HW) Pareto curve shows that the models with the four NASH versions represent the best trade-offs achieving the highest accuracy for a given HW utilization. The code for our implementation is open-source and publicly available on GitHub at https://github.com/MFJI/NASH.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. S. Chawla, “Application of convolution neural network in web query session mining for personalised web search,” INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, vol. 24, no. 4, SI, pp. 417–428, 2021.
  2. A. Heidari, N. J. Navimipour, M. Unal, and S. Toumaj, “Machine learning applications for covid-19 outbreak management,” NEURAL COMPUTING & APPLICATIONS, vol. 34, no. 18, SI, pp. 15 313–15 348, SEP 2022.
  3. S. Mozaffari, O. Y. Al-Jarrah, M. Dianati, P. Jennings, and A. Mouzakitis, “Deep learning-based vehicle behavior prediction for autonomous driving applications: A review,” IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, vol. 23, no. 1, pp. 33–47, JAN 2022.
  4. Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” 2013.
  5. B. Zhu, Z. Al-Ars, and W. Pan, “Towards lossless binary convolutional neural networks using piecewise approximation,” in ECAI 2020: 24TH European Conference on Artificial Intelligence, vol. 325, 2020, pp. 1730–1737.
  6. B. Zhu, H. P. Hofstee, J. Lee, and Z. Al-Ars, “Improving gradient paths for binary convolutional neural networks,” in 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022.   BMVA Press, 2022, p. 281. [Online]. Available: https://bmvc2022.mpi-inf.mpg.de/281/
  7. B. Zhu, Z. Al-Ars, and H. P. Hofstee, “Nasb: Neural architecture search for binary convolutional neural networks,” in IEEE 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020.
  8. Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “Finn: A framework for fast, scalable binarized neural network inference,” in FPGA’17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, pp. 65–74.
  9. M. Blott, T. B. Preusser, N. J. Fraser, G. Gambardella, K. O’Brien, Y. Umuroglu, M. Leeser, and K. Vissers, “Finn-r: An end-to-end deep-learning framework for fast exploration of quantized neural networks,” ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, vol. 11, no. 3, SI, DEC 2018.
  10. H. Pham, M. Y. Guan, B. Zoph, Q. Le, V, and J. Dean, “Efficient neural architecture search via parameter sharing,” in 35th INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80, 2018.
  11. C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture search,” in COMPUTER VISION - ECCV 2018, PT I, ser. Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11205, 2018, pp. 19–35.
  12. J. Cui, P. Chen, R. Li, S. Liu, X. Shen, and J. Jia, “Fast and practical neural architecture search,” in 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), ser. IEEE International Conference on Computer Vision, 2019, pp. 6211–6220.
  13. B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8697–8710.
  14. E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in AAAI Conference on Artificial Intelligence, 2019, pp. 4780–4789.
  15. H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” 2019.
  16. S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neural architecture search,” arXiv preprint arXiv:1812.09926, 2018.
  17. A. Pappalardo, “Xilinx/brevitas,” 2022. [Online]. Available: https://doi.org/10.5281/zenodo.3333552
  18. J. Stanisz, K. Lis, and M. Gorgon, “Implementation of the pointpillars network for 3d object detection in reprogrammable heterogeneous devices using finn,” JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, vol. 94, no. 7, SI, pp. 659–674, JUL 2022.
  19. F. Jentzsch, Y. Umuroglu, A. Pappalardo, M. Blott, and M. Platzner, “Radioml meets finn: Enabling future rf applications with fpga streaming architectures,” IEEE MICRO, vol. 42, no. 6, pp. 125–133, NOV 1 2022.
  20. V. Rybalkin, M. M. Ghaffar, N. Wehn, A. Pappalardo, G. Gambardella, and M. Blott, “Finn-l: Library extensions and design trade-off analysis for variable precision lstm networks on fpgas,” in International Conference on Field Programmable Logic and Applications, 2018, pp. 89–96.
  21. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
  22. A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009.
  23. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” INTERNATIONAL JOURNAL OF COMPUTER VISION, vol. 115, no. 3, pp. 211–252, DEC 2015.
  24. S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160, 2016.
  25. J. Faraone, N. Fraser, M. Blott, and P. H. W. Leong, “Syq: Learning symmetric quantization for efficient deep neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4300–4309.
  26. C. Gong, Y. Chen, Y. Lu, T. Li, C. Hao, and D. Chen, “Vecq: Minimal loss dnn model compression with vectorized weight quantization,” IEEE TRANSACTIONS ON COMPUTERS, vol. 70, no. 5, pp. 696–710, MAY 1 2021.
  27. Z. Liu, W. Luo, B. Wu, X. Yang, W. Liu, and K.-T. Cheng, “Bi-real net: Binarizing deep network towards real-network performance,” INTERNATIONAL JOURNAL OF COMPUTER VISION, vol. 128, no. 1, pp. 202–219, JAN 2020.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets