A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator (2404.15823v1)
Abstract: As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory hierarchy framework tailored for per layer adaptive memory access patterns of DNNs. The hierarchy requests data on-demand from the off-chip memory to provide it to the accelerator's compute units. The objective is to strike an optimized balance between minimizing the required memory capacity and maintaining high accelerator performance. The framework is characterized by its configurability, allowing the creation of a tailored memory hierarchy with up to five levels. Furthermore, the framework incorporates an optional shift register as final level to increase the flexibility of the memory management process. A comprehensive loop-nest analysis of DNN layers shows that the framework can efficiently execute the access patterns of most loop unrolls. Synthesis results and a case study of the DNN accelerator UltraTrail indicate a possible reduction in chip area of up to 62.2% as smaller memory modules can be used. At the same time, the performance loss can be minimized to 2.4%.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- P. P. Bernardo, C. Gerum, A. Frischknecht, K. Lübeck, and O. Bringmann, “Ultratrail: A configurable ultralow-power tc-resnet ai accelerator for efficient keyword spotting,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 4240–4251, 2020.
- A. Kyriakos, V. Kitsakis, A. Louropoulos, E.-A. Papatheofanous, I. Patronas, and D. Reisis, “High performance accelerator for cnn applications,” in 2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS). IEEE, 2019, pp. 135–140.
- T. S. Ajani, A. L. Imoize, and A. A. Atayero, “An overview of machine learning within embedded and mobile devices–optimizations and applications,” Sensors, vol. 21, no. 13, p. 4412, 2021.
- K. Siu, D. M. Stuart, M. Mahmoud, and A. Moshovos, “Memory requirements for convolutional neural network hardware accelerators,” in 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2018, pp. 111–121.
- A. D. Pimentel, “Exploring exploration: A tutorial introduction to embedded systems design space exploration,” IEEE Design & Test, vol. 34, no. 1, pp. 77–90, 2016.
- L. Mei, P. Houshmand, V. Jain, S. Giraldo, and M. Verhelst, “Zigzag: A memory-centric rapid dnn accelerator design space exploration framework,” arXiv preprint arXiv:2007.11360, 2020.
- ——, “Zigzag: Enlarging joint architecture-mapping design space exploration for dnn accelerators,” IEEE Transactions on Computers, vol. 70, no. 8, pp. 1160–1174, 2021.
- C. Li, Y. Yang, M. Feng, S. Chakradhar, and H. Zhou, “Optimizing memory efficiency for deep convolutional neural networks on gpus,” in SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2016, pp. 633–644.
- X. Li, J. Li, and G. Yan, “Optimizing memory efficiency for deep convolutional neural network accelerators,” Journal of Low Power Electronics, vol. 14, no. 4, pp. 496–507, 2018.
- R. Rajsuman, “Design and test of large embedded memories: An overview,” IEEE Design & Test of Computers, vol. 18, no. 03, pp. 16–27, 2001.
- B. Gunasekaran, “8 types of memory every embedded engineer should know about!” 6 2019. [Online]. Available: https://embeddedinventor.com/8-types-of-memory-every-embedded-engineer-%****␣main.bbl␣Line␣100␣****should-know-about/#Secondary_Memory_types
- S. Choi, S. Seo, B. Shin, H. Byun, M. Kersner, B. Kim, D. Kim, and S. Ha, “Temporal convolution for real-time keyword spotting on mobile devices,” arXiv preprint arXiv:1904.03814, 2019.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
- B. Jang, D. Schaa, P. Mistry, and D. Kaeli, “Exploiting memory access patterns to improve memory performance in data-parallel architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 105–118, 2010.
- R. Das and T. Krishna, “Dnn accelerator architecture - simd or systolic?” 9 2018. [Online]. Available: https://www.sigarch.org/dnn-accelerator-architecture-simd-or-systolic/
- R. Lerch, B. Hosseini, P. Gembaczka, G. A. Fink, A. Lüdecke, V. Brack, F. Ercan, A. Utz, and K. Seidl, “Design of an artificial neural network circuit for detecting atrial fibrillation in ecg signals,” in 2021 IEEE Sensors. IEEE, 2021, pp. 1–4.
- Cocotb homepage. [Online]. Available: https://www.cocotb.org/
- S. Griffiths, “bitstring,” https://github.com/scott-griffiths/bitstring, 4 2023.
- Oliver Bause (2 papers)
- Paul Palomero Bernardo (4 papers)
- Oliver Bringmann (34 papers)