Papers
Topics
Authors
Recent
2000 character limit reached

Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications (2403.08980v1)

Published 13 Mar 2024 in cs.LG and cs.AR

Abstract: With more scientific fields relying on neural networks (NNs) to process data incoming at extreme throughputs and latencies, it is crucial to develop NNs with all their parameters stored on-chip. In many of these applications, there is not enough time to go off-chip and retrieve weights. Even more so, off-chip memory such as DRAM does not have the bandwidth required to process these NNs as fast as the data is being produced (e.g., every 25 ns). As such, these extreme latency and bandwidth requirements have architectural implications for the hardware intended to run these NNs: 1) all NN parameters must fit on-chip, and 2) codesigning custom/reconfigurable logic is often required to meet these latency and bandwidth constraints. In our work, we show that many scientific NN applications must run fully on chip, in the extreme case requiring a custom chip to meet such stringent constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (9)
  1. J. Duarte, N. Tran, B. Hawks, C. Herwig, J. Muhizi, S. Prakash, and V. J. Reddi, “Fastml science benchmarks: Accelerating real-time scientific edge machine learning,” arXiv preprint arXiv:2207.07958, 2022.
  2. G. Di Guglielmo, F. Fahim, C. Herwig, M. B. Valentin, J. Duarte, C. Gingu, P. Harris, J. Hirschauer, M. Kwok, V. Loncar et al., “A reconfigurable neural network asic for detector front-end data compression at the hl-lhc,” IEEE Transactions on Nuclear Science, vol. 68, no. 8, pp. 2179–2186, 2021.
  3. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  4. Y. Wei, R. F. Forelli, C. Hansen, J. P. Levesque, N. Tran, J. C. Agar, G. Di Guglielmo, M. E. Mauel, and G. A. Navratil, “Low latency optical-based mode tracking with machine learning deployed on fpgas on a tokamak,” Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States), Tech. Rep., 2023.
  5. H. Borras, G. Di Guglielmo, J. Duarte, N. Ghielmetti, B. Hawks, S. Hauck, S.-C. Hsu, R. Kastner, J. Liang, A. Meza et al., “Open-source fpga-ml codesign for the mlperf tiny benchmark,” arXiv preprint arXiv:2206.11791, 2022.
  6. R. Kastner, J. Matai, and S. Neuendorffer, “Parallel programming for fpgas,” arXiv preprint arXiv:1805.03648, 2018.
  7. J. Duarte et al., “Fast inference of deep neural networks in FPGAs for particle physics,” JINST, vol. 13, no. 07, p. P07027, 2018.
  8. Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “Finn: A framework for fast, scalable binarized neural network inference,” in Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, 2017, pp. 65–74.
  9. Y. Umuroglu, Y. Akhauri, N. J. Fraser, and M. Blott, “Logicnets: Co-designed neural networks and circuits for extreme-throughput applications,” in 2020 30th International Conference on Field-Programmable Logic and Applications (FPL).   IEEE, 2020, pp. 291–297.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.