LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators (2312.03146v1)
Abstract: In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained NVM-based IMC accelerators. LRMP uses a combination of reinforcement learning and integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.8-9$\times$ latency and 11.8-19$\times$ throughput improvement at iso-accuracy.
- N. Jouppi, C. Young, N. Patil, and D. Patterson, “Motivation for and evaluation of the first tensor processing unit,” ieee Micro, vol. 38, no. 3, pp. 10–19, 2018.
- S. Lie, “Cerebras architecture deep dive: First look inside the hw/sw co-design for deep learning: Cerebras systems,” in 2022 IEEE Hot Chips 34 Symposium (HCS). IEEE Computer Society, 2022, pp. 1–34.
- Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam, “Diannao family: energy-efficient hardware accelerators for machine learning,” Communications of the ACM, vol. 59, no. 11, pp. 105–112, 2016.
- T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang, “Pruning and quantization for deep neural network acceleration: A survey,” Neurocomputing, vol. 461, pp. 370–403, 2021.
- M. Kang, S. K. Gonugondla, and N. R. Shanbhag, “Deep in-memory architectures in sram: An analog approach to approximate computing,” Proceedings of the IEEE, vol. 108, no. 12, pp. 2251–2275, 2020.
- S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “Xnor-sram: In-memory computing sram macro for binary/ternary deep neural networks,” IEEE Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, 2020.
- J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classifier in a standard 6t sram array,” IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915–924, 2017.
- F. Gao, G. Tziantzioulis, and D. Wentzlaff, “Computedram: In-memory compute using off-the-shelf drams,” in Proceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture, 2019, pp. 100–113.
- A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26, 2016.
- P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
- L. Song, X. Qian, H. Li, and Y. Chen, “Pipelayer: A pipelined reram-based accelerator for deep learning,” in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017, pp. 541–552.
- G. W. Burr, R. M. Shelby, S. Sidler, C. Di Nolfo, J. Jang, I. Boybat, R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti et al., “Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element,” IEEE Transactions on Electron Devices, vol. 62, no. 11, pp. 3498–3507, 2015.
- P. Narayanan, S. Ambrogio, A. Okazaki, K. Hosokawa, H. Tsai, A. Nomura, T. Yasuda, C. Mackin, S. C. Lewis, A. Friz et al., “Fully on-chip mac at 14 nm enabled by accurate row-wise programming of pcm-based weights and parallel vector-transport in duration-format,” IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6629–6636, 2021.
- R. Khaddam-Aljameh, M. Stanisavljevic, J. F. Mas, G. Karunaratne, M. Braendli, F. Liu, A. Singh, S. M. Müller, U. Egger, A. Petropoulos et al., “Hermes core–a 14nm cmos and pcm-based in-memory compute core using an array of 300ps/lsb linearized cco-based adcs and local digital processing,” in 2021 Symposium on VLSI Circuits. IEEE, 2021, pp. 1–2.
- H. Yan, H. R. Cherian, E. C. Ahn, and L. Duan, “Celia: A device and architecture co-design framework for stt-mram-based deep learning acceleration,” in Proceedings of the 2018 International Conference on Supercomputing, 2018, pp. 149–159.
- S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, “Computing in memory with spin-transfer torque magnetic ram,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 3, pp. 470–483, 2018.
- M. Chang, S. D. Spetalnick, B. Crafton, W.-S. Khwa, Y.-D. Chih, M.-F. Chang, and A. Raychowdhury, “A 40nm 60.64tops/w ecc-capable compute-in-memory/digital 2.25mb/768kb rram/sram system with embedded cortex m3 microprocessor for edge recommendation systems,” in 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, 2022, pp. 1–3.
- M. J. Rasch, T. Gokmen, M. Rigotti, and W. Haensch, “Rapa-convnets: Modified convolutional networks for accelerated training on architectures with analog arrays,” Frontiers in Neuroscience, vol. 13, p. 753, 2019.
- W. Shim, J. Meng, X. Peng, J.-s. Seo, and S. Yu, “Impact of multilevel retention characteristics on rram based dnn inference engine,” in 2021 IEEE International Reliability Physics Symposium (IRPS), 2021, pp. 1–4.
- E. Perez, M. K. Mahadevaiah, E. P.-B. Quesada, and C. Wenger, “Variability and energy consumption tradeoffs in multilevel programming of rram arrays,” IEEE Transactions on Electron Devices, vol. 68, no. 6, pp. 2693–2698, 2021.
- M. Asghari, A. M. Fathollahi-Fard, S. Mirzapour Al-e hashem, and M. A. Dulebenets, “Transformation and linearization techniques in optimization: A state-of-the-art survey,” Mathematics, vol. 10, no. 2, p. 283, 2022.
- K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “Haq: Hardware-aware automated quantization with mixed precision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8612–8620.
- S. Roy, S. Sridharan, S. Jain, and A. Raghunathan, “Txsim: Modeling training of deep neural networks on resistive crossbar systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 4, pp. 730–738, 2021.
- S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “Rxnn: A framework for evaluating deep neural networks on resistive crossbars,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 2, pp. 326–338, 2020.
- A. Lu, X. Peng, W. Li, H. Jiang, and S. Yu, “Neurosim simulator for compute-in-memory hardware accelerator: Validation and benchmark,” Frontiers in artificial intelligence, vol. 4, p. 659060, 2021.
- J. Meng, W. Shim, L. Yang, I. Yeo, D. Fan, S. Yu, and J.-s. Seo, “Temperature-resilient rram-based in-memory computing for dnn inference,” IEEE Micro, vol. 42, no. 1, pp. 89–98, 2021.
- G. Charan, A. Mohanty, X. Du, G. Krishnan, R. V. Joshi, and Y. Cao, “Accurate inference with inaccurate rram devices: A joint algorithm-design solution,” IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 6, no. 1, pp. 27–35, 2020.
- J. Peng, H. Liu, Z. Zhao, Z. Li, S. Liu, and Q. Li, “Cmq: Crossbar-aware neural network mixed-precision quantization via differentiable architecture search,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
- B. Kang, A. Lu, Y. Long, D. H. Kim, S. Yu, S. Yu, and S. Mukhopadhyay, “Genetic algorithm based energy-aware cnn quantization for processing-in-memory architecture,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021.
- S. Huang, A. Ankit, P. Silveira, R. Antunes, S. R. Chalamalasetti, I. E. Hajj, D. E. Kim, G. Aguiar, P. Bruel, S. Serebryakov, C. Xu, C. Li, F. Paolo, J. P. Strachan, D. Chen, K. Roy, W. mei W. Hwu, and D. S. Milojicic, “Mixed precision quantization for reram-based dnn inference accelerators,” 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC), 2021.
- J. Meng, L. Yang, X. Peng, S. Yu, D. Fan, and J. sun Seo, “Structured pruning of rram crossbars for efficient in-memory computing acceleration of deep neural networks,” IEEE Transactions on Circuits and Systems Ii-express Briefs, 2021.
- B. Li, Y. Wang, and Y. Chen, “Hitm,” Proceedings of the 39th International Conference on Computer-Aided Design, 2020.
- R. Gopalakrishnan, Y. Chua, P. Sun, A. J. S. Kumar, and A. Basu, “Hfnet: A cnn architecture co-designed for neuromorphic hardware with a crossbar array of synapses,” Frontiers in Neuroscience, 2020.
- X. Peng, R. Liu, and S. Yu, “Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, pp. 1–5.
- K. He, I. Chakraborty, C. Wang, and K. Roy, “Design space and memory technology co-exploration for in-memory computing based machine learning accelerators,” in Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, ser. ICCAD ’22. New York, NY, USA: Association for Computing Machinery, 2022. [Online]. Available: https://doi.org/10.1145/3508352.3549453
- H. Jiang, W. Li, S. Huang, S. Cosemans, S. Cosemans, S. Cosemans, S. Cosemans, F. Catthoor, F. Catthoor, and S. Yu, “Analog-to-digital converter design exploration for compute-in-memory accelerators,” IEEE Design and Test of Computers, vol. 38, pp. 1–8, 2021.
- U. Saxena, I. Chakraborty, and K. Roy, “Towards adc-less compute-in-memory accelerators for energy efficient deep learning,” Design, Automation and Test in Europe, 2022.
- Abinand Nallathambi (2 papers)
- Christin David Bose (1 paper)
- Wilfried Haensch (18 papers)
- Anand Raghunathan (37 papers)