Special Session: Reliability Analysis for ML/AI Hardware (2103.12166v2)
Abstract: AI and Machine Learning (ML) are becoming pervasive in today's applications, such as autonomous vehicles, healthcare, aerospace, cybersecurity, and many critical applications. Ensuring the reliability and robustness of the underlying AI/ML hardware becomes our paramount importance. In this paper, we explore and evaluate the reliability of different AI/ML hardware. The first section outlines the reliability issues in a commercial systolic array-based ML accelerator in the presence of faults engendering from device-level non-idealities in the DRAM. Next, we quantified the impact of circuit-level faults in the MSB and LSB logic cones of the Multiply and Accumulate (MAC) block of the AI accelerator on the AI/ML accuracy. Finally, we present two key reliability issues -- circuit aging and endurance in emerging neuromorphic hardware platforms and present our system-level approach to mitigate them.
- Shamik Kundu (9 papers)
- Kanad Basu (23 papers)
- Mehdi Sadi (9 papers)
- Twisha Titirsha (8 papers)
- Shihao Song (22 papers)
- Anup Das (48 papers)
- Ujjwal Guin (14 papers)