LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators (2312.03146v1)

Published 5 Dec 2023 in cs.AR

Abstract: In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained NVM-based IMC accelerators. LRMP uses a combination of reinforcement learning and integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.8-9$\times$ latency and 11.8-19$\times$ throughput improvement at iso-accuracy.

References (37)

Authors (4)

Abinand Nallathambi (2 papers)
Christin David Bose (1 paper)
Wilfried Haensch (18 papers)
Anand Raghunathan (37 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators (2312.03146v1)

Summary

Related Papers