Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAFFIRA: a Framework for Assessing the Reliability of Systolic-Array-Based DNN Accelerators (2403.02946v1)

Published 5 Mar 2024 in cs.AI, cs.AR, and cs.LG

Abstract: Systolic array has emerged as a prominent architecture for Deep Neural Network (DNN) hardware accelerators, providing high-throughput and low-latency performance essential for deploying DNNs across diverse applications. However, when used in safety-critical applications, reliability assessment is mandatory to guarantee the correct behavior of DNN accelerators. While fault injection stands out as a well-established practical and robust method for reliability assessment, it is still a very time-consuming process. This paper addresses the time efficiency issue by introducing a novel hierarchical software-based hardware-aware fault injection strategy tailored for systolic array-based DNN accelerators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. M. Taheri, “Dnn hardware reliability assessment and enhancement,” in 27th IEEE European Test Symposium (ETS), 2022.
  2. M. H. Ahmadilivani et al., “A systematic literature review on hardware reliability assessment methods for deep neural networks,” ACM Computing Surveys, vol. 56, no. 6, pp. 1–39, 2024.
  3. A. Bosio, I. O’Connor, M. Traiola, J. Echavarria, J. Teich, M. A. Hanif, M. Shafique, S. Hamdioui, B. Deveautour, P. Girard, et al., “Emerging computing devices: Challenges and opportunities for test and reliability,” in 2021 IEEE European Test Symposium (ETS), pp. 1–10, IEEE, 2021.
  4. M. Taheri et al., “Appraiser: Dnn fault resilience analysis employing approximation errors,” in 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), pp. 124–127, IEEE, 2023.
  5. M. Taheri et al., “Deepaxe: A framework for exploration of approximation and reliability trade-offs in dnn accelerators,” in 2023 24th International Symposium on Quality Electronic Design (ISQED), pp. 1–8, IEEE, 2023.
  6. M. Taheri, N. Cherezova, M. S. Ansari, M. Jenihhin, A. Mahani, M. Daneshtalab, and J. Raik, “Exploration of activation fault reliability in quantized systolic array-based dnn accelerators,” arXiv preprint arXiv:2401.09509, 2024.
  7. A. Ruospo et al., “A survey on deep learning resilience assessment methodologies,” Computer, vol. 56, no. 2, pp. 57–66, 2023.
  8. M. H. Ahmadilivani et al., “Special session: Approximation and fault resiliency of dnn accelerators,” in 2023 IEEE 41st VLSI Test Symposium (VTS), pp. 1–10, IEEE, 2023.
  9. A. Mahmoud et al., “Pytorchfi: A runtime perturbation tool for dnns,” in 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 25–31, IEEE, 2020.
  10. N. Narayanan et al., “Fault injection for tensorflow applications,” IEEE Transactions on Dependable and Secure Computing, 2022.
  11. Z. Chen et al., “Binfi: an efficient fault injector for safety-critical machine learning systems,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–23, 2019.
  12. U. K. Agarwal, A. Chan, and K. Pattabiraman, “Lltfi: Framework agnostic fault injection for machine learning applications (tools and artifact track),” in 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE), pp. 286–296, IEEE, 2022.
  13. S. Pappalardo et al., “Resilience-performance tradeoff analysis of a deep neural network accelerator,” in 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), pp. 181–186, IEEE, 2023.
  14. A. Azizimazreah et al., “Tolerating soft errors in deep learning accelerators with reliable on-chip memory designs,” in 2018 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–10, IEEE, 2018.
  15. W. Li et al., “Soft error mitigation for deep convolution neural network on fpga accelerators,” in 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 1–5, IEEE, 2020.
  16. E. Ozen and A. Orailoglu, “Low-cost error detection in deep neural network accelerators with linear algorithmic checksums,” Journal of Electronic Testing, vol. 36, no. 6, pp. 703–718, 2020.
  17. M. Jasemi, S. Hessabi, and N. Bagherzadeh, “Enhancing reliability of emerging memory technology for machine learning accelerators,” IEEE Transactions on Emerging Topics in Computing, vol. 9, no. 4, pp. 2234–2240, 2020.
  18. E. Ozen and A. Orailoglu, “Boosting bit-error resilience of dnn accelerators through median feature selection,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 3250–3262, 2020.
  19. B. F. Goldstein et al., “A lightweight error-resiliency mechanism for deep neural networks,” in 2021 22nd International Symposium on Quality Electronic Design (ISQED), pp. 311–316, IEEE, 2021.
  20. S. Burel, A. Evans, and L. Anghel, “Mozart+: Masking outputs with zeros for improved architectural robustness and testing of dnn accelerators,” IEEE Transactions on Device and Materials Reliability, vol. 22, no. 2, pp. 120–128, 2022.
  21. “”N2D2 CAD framework for DNNs”.” https://github.com/cea-list/N2D2. [Online].
  22. L.-H. Hoang, M. A. Hanif, and M. Shafique, “Tre-map: Towards reducing the overheads of fault-aware retraining of deep neural networks by merging fault maps,” in 2021 24th Euromicro Conference on Digital System Design (DSD), pp. 434–441, IEEE, 2021.
  23. Y.-Y. Tsai and J.-F. Li, “Evaluating the impact of fault-tolerance capability of deep neural networks caused by faults,” in 2021 IEEE 34th International System-on-Chip Conference (SOCC), pp. 272–277, IEEE, 2021.
  24. T.-H. Nguyen et al., “Low-cost and effective fault-tolerance enhancement techniques for emerging memories-based deep neural networks,” in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1075–1080, IEEE, 2021.
  25. A. Samajdar et al., “Scale-sim: Systolic cnn accelerator simulator,” arXiv preprint arXiv:1811.02883, 2018.
  26. Y. Zhao, K. Wang, and A. Louri, “Fsa: An efficient fault-tolerant systolic array-based dnn accelerator architecture,” in 2022 IEEE 40th International Conference on Computer Design (ICCD), pp. 545–552, IEEE, 2022.
  27. G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and S. W. Keckler, “Understanding error propagation in deep learning neural network (dnn) accelerators and applications,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, 2017.
  28. P. Quinton, “Automatic synthesis of systolic arrays from uniform recurrent equations,” ACM SIGARCH Computer architecture news, vol. 12, no. 3, pp. 208–214, 1984.
  29. S. Hadjis et al., “Caffe con troll: Shallow ideas to speed up deep learning,” in Proceedings of the Fourth Workshop on Data analytics in the Cloud, pp. 1–4, 2015.
  30. R. Leveugle, A. Calvez, P. Maistri, and P. Vanhauwaert, “Statistical fault injection: Quantified error and confidence,” in 2009 Design, Automation & Test in Europe Conference & Exhibition, pp. 502–506, IEEE, 2009.
  31. G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and S. W. Keckler, “Understanding error propagation in deep learning neural network (dnn) accelerators and applications,” in SC17, 2017.
  32. S. Pappalardo et al., “A fault injection framework for ai hardware accelerators,” in 2023 IEEE 24th Latin American Test Symposium (LATS), IEEE, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Mahdi Taheri (17 papers)
  2. Masoud Daneshtalab (24 papers)
  3. Jaan Raik (26 papers)
  4. Maksim Jenihhin (31 papers)
  5. Salvatore Pappalardo (1 paper)
  6. Paul Jimenez (5 papers)
  7. Bastien Deveautour (1 paper)
  8. Alberto Bosio (8 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.