Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator (2404.15823v1)

Published 24 Apr 2024 in cs.AR and cs.AI

Abstract: As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory hierarchy framework tailored for per layer adaptive memory access patterns of DNNs. The hierarchy requests data on-demand from the off-chip memory to provide it to the accelerator's compute units. The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance. The framework is characterized by its configurability, allowing the creation of a tailored memory hierarchy with up to five levels. Furthermore, the framework incorporates an optional shift register as final level to increase the flexibility of the memory management process. A comprehensive loop-nest analysis of DNN layers shows that the framework can efficiently execute the access patterns of most loop unrolls. Synthesis results and a case study of the DNN accelerator UltraTrail indicate a possible reduction in chip area of up to 62.2% as smaller memory modules can be used. At the same time, the performance loss can be minimized to 2.4%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  2. P. P. Bernardo, C. Gerum, A. Frischknecht, K. Lübeck, and O. Bringmann, “Ultratrail: A configurable ultralow-power tc-resnet ai accelerator for efficient keyword spotting,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 4240–4251, 2020.
  3. A. Kyriakos, V. Kitsakis, A. Louropoulos, E.-A. Papatheofanous, I. Patronas, and D. Reisis, “High performance accelerator for cnn applications,” in 2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS).   IEEE, 2019, pp. 135–140.
  4. T. S. Ajani, A. L. Imoize, and A. A. Atayero, “An overview of machine learning within embedded and mobile devices–optimizations and applications,” Sensors, vol. 21, no. 13, p. 4412, 2021.
  5. K. Siu, D. M. Stuart, M. Mahmoud, and A. Moshovos, “Memory requirements for convolutional neural network hardware accelerators,” in 2018 IEEE International Symposium on Workload Characterization (IISWC).   IEEE, 2018, pp. 111–121.
  6. A. D. Pimentel, “Exploring exploration: A tutorial introduction to embedded systems design space exploration,” IEEE Design & Test, vol. 34, no. 1, pp. 77–90, 2016.
  7. L. Mei, P. Houshmand, V. Jain, S. Giraldo, and M. Verhelst, “Zigzag: A memory-centric rapid dnn accelerator design space exploration framework,” arXiv preprint arXiv:2007.11360, 2020.
  8. ——, “Zigzag: Enlarging joint architecture-mapping design space exploration for dnn accelerators,” IEEE Transactions on Computers, vol. 70, no. 8, pp. 1160–1174, 2021.
  9. C. Li, Y. Yang, M. Feng, S. Chakradhar, and H. Zhou, “Optimizing memory efficiency for deep convolutional neural networks on gpus,” in SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.   IEEE, 2016, pp. 633–644.
  10. X. Li, J. Li, and G. Yan, “Optimizing memory efficiency for deep convolutional neural network accelerators,” Journal of Low Power Electronics, vol. 14, no. 4, pp. 496–507, 2018.
  11. R. Rajsuman, “Design and test of large embedded memories: An overview,” IEEE Design & Test of Computers, vol. 18, no. 03, pp. 16–27, 2001.
  12. B. Gunasekaran, “8 types of memory every embedded engineer should know about!” 6 2019. [Online]. Available: https://embeddedinventor.com/8-types-of-memory-every-embedded-engineer-%****␣main.bbl␣Line␣100␣****should-know-about/#Secondary_Memory_types
  13. S. Choi, S. Seo, B. Shin, H. Byun, M. Kersner, B. Kim, D. Kim, and S. Ha, “Temporal convolution for real-time keyword spotting on mobile devices,” arXiv preprint arXiv:1904.03814, 2019.
  14. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
  15. B. Jang, D. Schaa, P. Mistry, and D. Kaeli, “Exploiting memory access patterns to improve memory performance in data-parallel architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 105–118, 2010.
  16. R. Das and T. Krishna, “Dnn accelerator architecture - simd or systolic?” 9 2018. [Online]. Available: https://www.sigarch.org/dnn-accelerator-architecture-simd-or-systolic/
  17. R. Lerch, B. Hosseini, P. Gembaczka, G. A. Fink, A. Lüdecke, V. Brack, F. Ercan, A. Utz, and K. Seidl, “Design of an artificial neural network circuit for detecting atrial fibrillation in ecg signals,” in 2021 IEEE Sensors.   IEEE, 2021, pp. 1–4.
  18. Cocotb homepage. [Online]. Available: https://www.cocotb.org/
  19. S. Griffiths, “bitstring,” https://github.com/scott-griffiths/bitstring, 4 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Oliver Bause (2 papers)
  2. Paul Palomero Bernardo (4 papers)
  3. Oliver Bringmann (34 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com