Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators (2312.03146v1)

Published 5 Dec 2023 in cs.AR

Abstract: In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained NVM-based IMC accelerators. LRMP uses a combination of reinforcement learning and integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.8-9$\times$ latency and 11.8-19$\times$ throughput improvement at iso-accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. N. Jouppi, C. Young, N. Patil, and D. Patterson, “Motivation for and evaluation of the first tensor processing unit,” ieee Micro, vol. 38, no. 3, pp. 10–19, 2018.
  2. S. Lie, “Cerebras architecture deep dive: First look inside the hw/sw co-design for deep learning: Cerebras systems,” in 2022 IEEE Hot Chips 34 Symposium (HCS).   IEEE Computer Society, 2022, pp. 1–34.
  3. Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam, “Diannao family: energy-efficient hardware accelerators for machine learning,” Communications of the ACM, vol. 59, no. 11, pp. 105–112, 2016.
  4. T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang, “Pruning and quantization for deep neural network acceleration: A survey,” Neurocomputing, vol. 461, pp. 370–403, 2021.
  5. M. Kang, S. K. Gonugondla, and N. R. Shanbhag, “Deep in-memory architectures in sram: An analog approach to approximate computing,” Proceedings of the IEEE, vol. 108, no. 12, pp. 2251–2275, 2020.
  6. S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “Xnor-sram: In-memory computing sram macro for binary/ternary deep neural networks,” IEEE Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, 2020.
  7. J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classifier in a standard 6t sram array,” IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915–924, 2017.
  8. F. Gao, G. Tziantzioulis, and D. Wentzlaff, “Computedram: In-memory compute using off-the-shelf drams,” in Proceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture, 2019, pp. 100–113.
  9. A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26, 2016.
  10. P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
  11. L. Song, X. Qian, H. Li, and Y. Chen, “Pipelayer: A pipelined reram-based accelerator for deep learning,” in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017, pp. 541–552.
  12. G. W. Burr, R. M. Shelby, S. Sidler, C. Di Nolfo, J. Jang, I. Boybat, R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti et al., “Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element,” IEEE Transactions on Electron Devices, vol. 62, no. 11, pp. 3498–3507, 2015.
  13. P. Narayanan, S. Ambrogio, A. Okazaki, K. Hosokawa, H. Tsai, A. Nomura, T. Yasuda, C. Mackin, S. C. Lewis, A. Friz et al., “Fully on-chip mac at 14 nm enabled by accurate row-wise programming of pcm-based weights and parallel vector-transport in duration-format,” IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6629–6636, 2021.
  14. R. Khaddam-Aljameh, M. Stanisavljevic, J. F. Mas, G. Karunaratne, M. Braendli, F. Liu, A. Singh, S. M. Müller, U. Egger, A. Petropoulos et al., “Hermes core–a 14nm cmos and pcm-based in-memory compute core using an array of 300ps/lsb linearized cco-based adcs and local digital processing,” in 2021 Symposium on VLSI Circuits.   IEEE, 2021, pp. 1–2.
  15. H. Yan, H. R. Cherian, E. C. Ahn, and L. Duan, “Celia: A device and architecture co-design framework for stt-mram-based deep learning acceleration,” in Proceedings of the 2018 International Conference on Supercomputing, 2018, pp. 149–159.
  16. S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, “Computing in memory with spin-transfer torque magnetic ram,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 3, pp. 470–483, 2018.
  17. M. Chang, S. D. Spetalnick, B. Crafton, W.-S. Khwa, Y.-D. Chih, M.-F. Chang, and A. Raychowdhury, “A 40nm 60.64tops/w ecc-capable compute-in-memory/digital 2.25mb/768kb rram/sram system with embedded cortex m3 microprocessor for edge recommendation systems,” in 2022 IEEE International Solid- State Circuits Conference (ISSCC), vol. 65, 2022, pp. 1–3.
  18. M. J. Rasch, T. Gokmen, M. Rigotti, and W. Haensch, “Rapa-convnets: Modified convolutional networks for accelerated training on architectures with analog arrays,” Frontiers in Neuroscience, vol. 13, p. 753, 2019.
  19. W. Shim, J. Meng, X. Peng, J.-s. Seo, and S. Yu, “Impact of multilevel retention characteristics on rram based dnn inference engine,” in 2021 IEEE International Reliability Physics Symposium (IRPS), 2021, pp. 1–4.
  20. E. Perez, M. K. Mahadevaiah, E. P.-B. Quesada, and C. Wenger, “Variability and energy consumption tradeoffs in multilevel programming of rram arrays,” IEEE Transactions on Electron Devices, vol. 68, no. 6, pp. 2693–2698, 2021.
  21. M. Asghari, A. M. Fathollahi-Fard, S. Mirzapour Al-e hashem, and M. A. Dulebenets, “Transformation and linearization techniques in optimization: A state-of-the-art survey,” Mathematics, vol. 10, no. 2, p. 283, 2022.
  22. K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “Haq: Hardware-aware automated quantization with mixed precision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8612–8620.
  23. S. Roy, S. Sridharan, S. Jain, and A. Raghunathan, “Txsim: Modeling training of deep neural networks on resistive crossbar systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 4, pp. 730–738, 2021.
  24. S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “Rxnn: A framework for evaluating deep neural networks on resistive crossbars,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 2, pp. 326–338, 2020.
  25. A. Lu, X. Peng, W. Li, H. Jiang, and S. Yu, “Neurosim simulator for compute-in-memory hardware accelerator: Validation and benchmark,” Frontiers in artificial intelligence, vol. 4, p. 659060, 2021.
  26. J. Meng, W. Shim, L. Yang, I. Yeo, D. Fan, S. Yu, and J.-s. Seo, “Temperature-resilient rram-based in-memory computing for dnn inference,” IEEE Micro, vol. 42, no. 1, pp. 89–98, 2021.
  27. G. Charan, A. Mohanty, X. Du, G. Krishnan, R. V. Joshi, and Y. Cao, “Accurate inference with inaccurate rram devices: A joint algorithm-design solution,” IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 6, no. 1, pp. 27–35, 2020.
  28. J. Peng, H. Liu, Z. Zhao, Z. Li, S. Liu, and Q. Li, “Cmq: Crossbar-aware neural network mixed-precision quantization via differentiable architecture search,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
  29. B. Kang, A. Lu, Y. Long, D. H. Kim, S. Yu, S. Yu, and S. Mukhopadhyay, “Genetic algorithm based energy-aware cnn quantization for processing-in-memory architecture,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021.
  30. S. Huang, A. Ankit, P. Silveira, R. Antunes, S. R. Chalamalasetti, I. E. Hajj, D. E. Kim, G. Aguiar, P. Bruel, S. Serebryakov, C. Xu, C. Li, F. Paolo, J. P. Strachan, D. Chen, K. Roy, W. mei W. Hwu, and D. S. Milojicic, “Mixed precision quantization for reram-based dnn inference accelerators,” 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC), 2021.
  31. J. Meng, L. Yang, X. Peng, S. Yu, D. Fan, and J. sun Seo, “Structured pruning of rram crossbars for efficient in-memory computing acceleration of deep neural networks,” IEEE Transactions on Circuits and Systems Ii-express Briefs, 2021.
  32. B. Li, Y. Wang, and Y. Chen, “Hitm,” Proceedings of the 39th International Conference on Computer-Aided Design, 2020.
  33. R. Gopalakrishnan, Y. Chua, P. Sun, A. J. S. Kumar, and A. Basu, “Hfnet: A cnn architecture co-designed for neuromorphic hardware with a crossbar array of synapses,” Frontiers in Neuroscience, 2020.
  34. X. Peng, R. Liu, and S. Yu, “Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, pp. 1–5.
  35. K. He, I. Chakraborty, C. Wang, and K. Roy, “Design space and memory technology co-exploration for in-memory computing based machine learning accelerators,” in Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, ser. ICCAD ’22.   New York, NY, USA: Association for Computing Machinery, 2022. [Online]. Available: https://doi.org/10.1145/3508352.3549453
  36. H. Jiang, W. Li, S. Huang, S. Cosemans, S. Cosemans, S. Cosemans, S. Cosemans, F. Catthoor, F. Catthoor, and S. Yu, “Analog-to-digital converter design exploration for compute-in-memory accelerators,” IEEE Design and Test of Computers, vol. 38, pp. 1–8, 2021.
  37. U. Saxena, I. Chakraborty, and K. Roy, “Towards adc-less compute-in-memory accelerators for energy efficient deep learning,” Design, Automation and Test in Europe, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Abinand Nallathambi (2 papers)
  2. Christin David Bose (1 paper)
  3. Wilfried Haensch (18 papers)
  4. Anand Raghunathan (37 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.