Papers
Topics
Authors
Recent
2000 character limit reached

FPGA Resource-aware Structured Pruning for Real-Time Neural Networks (2308.05170v2)

Published 9 Aug 2023 in cs.AR and cs.AI

Abstract: Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning and quantization, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. Evaluated on a range of tasks, including sub-microsecond particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves reductions ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM utilization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (9)
  1. S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural network,” in NIPS, 2015.
  2. T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste, “Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks,” J. Mach. Learn. Res., vol. 22, pp. 241:1–241:124, 2021.
  3. FastML Team, “fastmachinelearning/hls4ml,” 2023. [Online]. Available: https://github.com/fastmachinelearning/hls4ml
  4. J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis et al., “Fast inference of deep neural networks in FPGAs for particle physics,” Journal of Instrumentation, vol. 13, no. 07, p. P07027, Jul 2018.
  5. M. Shen, H. Yin, P. Molchanov, L. Mao, J. Liu, and J. M. Alvarez, “Structural pruning via latency-saliency knapsack,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 12 894–12 908.
  6. T. Aarrestad, V. Loncar, N. Ghielmetti, M. Pierini, S. Summers, J. Ngadiuba et al., “Fast convolutional neural networks on FPGAs with hls4ml,” Machine Learning: Science and Technology, vol. 2, no. 4, p. 045015, Jul 2021.
  7. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, pp. 2278–2324, 1998.
  8. F. Chollet et al., “Keras,” https://keras.io, 2015.
  9. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.