Pruning for Improved ADC Efficiency in Crossbar-based Analog In-memory Accelerators (2403.13082v1)
Abstract: Deep learning has proved successful in many applications but suffers from high computational demands and requires custom accelerators for deployment. Crossbar-based analog in-memory architectures are attractive for acceleration of deep neural networks (DNN), due to their high data reuse and high efficiency enabled by combining storage and computation in memory. However, they require analog-to-digital converters (ADCs) to communicate crossbar outputs. ADCs consume a significant portion of energy and area of every crossbar processing unit, thus diminishing the potential efficiency benefits. Pruning is a well-studied technique to improve the efficiency of DNNs but requires modifications to be effective for crossbars. In this paper, we motivate crossbar-attuned pruning to target ADC-specific inefficiencies. This is achieved by identifying three key properties (dubbed D.U.B.) that induce sparsity that can be utilized to reduce ADC energy without sacrificing accuracy. The first property ensures that sparsity translates effectively to hardware efficiency by restricting sparsity levels to Discrete powers of 2. The other 2 properties encourage columns in the same crossbar to achieve both Unstructured and Balanced sparsity in order to amortize the accuracy drop. The desired D.U.B. sparsity is then achieved by regularizing the variance of $L_{0}$ norms of neighboring columns within the same crossbar. Our proposed implementation allows it to be directly used in end-to-end gradient-based training. We apply the proposed algorithm to convolutional layers of VGG11 and ResNet18 models, trained on CIFAR-10 and ImageNet datasets, and achieve up to 7.13x and 1.27x improvement, respectively, in ADC energy with less than 1% drop in accuracy.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE CVPR, 2016, pp. 770–778.
- E. Frantar and D. Alistarh, “SparseGPT: Massive language models can be accurately pruned in one-shot,” in ICML, vol. 202. PMLR, 23–29 Jul 2023, pp. 10 323–10 337.
- Z. Zong, G. Song, and Y. Liu, “Detrs with collaborative hybrid assignments training,” in IEEE/CVF ICCV, 2023, pp. 6748–6758.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” ICLR, 2016.
- F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam et al., “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,” IEEE TCAD, vol. 34, no. 10, pp. 1537–1557, 2015.
- A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, 2016.
- P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
- L. Song, X. Qian, H. Li, and Y. Chen, “Pipelayer: A pipelined reram-based accelerator for deep learning,” in 2017 IEEE HPCA. IEEE, 2017, pp. 541–552.
- A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-m. W. Hwu, J. P. Strachan, K. Roy et al., “Puma: A programmable ultra-efficient memristor-based accelerator for machine learning inference,” in ASPLOS, 2019, pp. 715–731.
- A. Ankit, I. El Hajj, S. R. Chalamalasetti, S. Agarwal, M. Marinella, M. Foltin, J. P. Strachan, D. Milojicic, W.-M. Hwu, and K. Roy, “Panther: A programmable architecture for neural network training harnessing energy-efficient reram,” IEEE Transactions on Computers, vol. 69, no. 8, pp. 1128–1142, 2020.
- W. Wan, R. Kubendran, C. Schaefer, S. B. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao et al., “A compute-in-memory chip based on resistive random-access memory,” Nature, vol. 608, no. 7923, pp. 504–512, 2022.
- T.-H. Yang, H.-Y. Cheng, C.-L. Yang, I.-C. Tseng, H.-W. Hu, H.-S. Chang, and H.-P. Li, “Sparse reram engine: Joint exploration of activation and weight sparsity in compressed neural networks,” in ISCA, 2019, pp. 236–249.
- Z. Chen, X. Chen, and J. Gu, “15.3 a 65nm 3t dynamic analog ram-based computing-in-memory macro and cnn accelerator with retention enhancement, adaptive analog sparsity and 44tops/w system energy efficiency,” in 2021 IEEE ISSCC, vol. 64. IEEE, 2021, pp. 240–242.
- M. Ali, I. Chakraborty, U. Saxena, A. Agrawal, A. Ankit, and K. Roy, “A 35.5-127.2 tops/w dynamic sparsity-aware reconfigurable-precision compute-in-memory sram macro for machine learning,” IEEE Solid-State Circuits Letters, vol. 4, pp. 129–132, 2021.
- D. E. Kim, A. Ankit, C. Wang, and K. Roy, “Samba: Sparsity aware in-memory computing based machine learning accelerator,” IEEE Transactions on Computers, 2023.
- C. Ogbogu, M. Soumen, B. K. Joardar, J. R. Doppa, D. Heo, K. Chakrabarty, and P. P. Pande, “Energy-efficient reram-based ml training via mixed pruning and reconfigurable adc,” in 2023 IEEE/ACM ISLPED. IEEE, 2023, pp. 1–6.
- K. Roy, I. Chakraborty, M. Ali, A. Ankit, and A. Agrawal, “In-memory computing in emerging memory technologies for machine learning: an overview,” in 2020 57th ACM/IEEE DAC. IEEE, 2020, pp. 1–6.
- I. Chakraborty, M. Ali, A. Ankit, S. Jain, S. Roy, S. Sridharan, A. Agrawal, A. Raghunathan, and K. Roy, “Resistive crossbars as approximate hardware building blocks for machine learning: Opportunities and challenges,” Proceedings of the IEEE, vol. 108, no. 12, pp. 2276–2310, 2020.
- S. Huang, A. Ankit, P. Silveira, R. Antunes, S. R. Chalamalasetti, I. El Hajj, D. E. Kim, G. Aguiar, P. Bruel, S. Serebryakov et al., “Mixed precision quantization for reram-based dnn inference accelerators,” in 2021 26th ASP-DAC. IEEE, 2021, pp. 372–377.
- D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the state of neural network pruning?” in Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze, Eds., vol. 2, 2020, pp. 129–146.
- S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural network,” in NeurIPS, 2015.
- T. Zhang, S. Ye, K. Zhang, J. Tang, W. Wen, M. Fardad, and Y. Wang, “A systematic dnn weight pruning framework using alternating direction method of multipliers,” in ECCV, 2018, pp. 184–199.
- N. Rathi, P. Panda, and K. Roy, “Stdp-based pruning of connections and weight quantization in spiking neural networks for energy-efficient recognition,” IEEE TCAD, vol. 38, no. 4, pp. 668–677, 2018.
- W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” NeurIPS, vol. 29, pp. 2074–2082, 2016.
- H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” ICLR, 2017.
- I. Garg, P. Panda, and K. Roy, “A low effort approach to structured cnn design using pca,” IEEE Access, vol. 8, pp. 1347–1360, 2019.
- L. Yang, Z. He, and D. Fan, “Harmonious coexistence of structured weight pruning and ternarization for deep neural networks,” in AAAI, vol. 34, no. 04, 2020, pp. 6623–6630.
- S. A. Aketi, S. Roy, A. Raghunathan, and K. Roy, “Gradual channel pruning while training using feature relevance scores for convolutional neural networks,” IEEE Access, vol. 8, pp. 171 924–171 932, 2020.
- C. Chu, Y. Wang, Y. Zhao, X. Ma, S. Ye, Y. Hong, X. Liang, Y. Han, and L. Jiang, “Pim-prune: fine-grain dcnn pruning for crossbar-based process-in-memory architecture,” in ACM/IEEE DAC. IEEE, 2020.
- L. Liang, L. Deng, Y. Zeng, X. Hu, Y. Ji, X. Ma, G. Li, and Y. Xie, “Crossbar-aware neural network pruning,” IEEE Access, vol. 6, pp. 58 324–58 337, 2018.
- J. Lin, Z. Zhu, Y. Wang, and Y. Xie, “Learning the sparsity for reram: Mapping and pruning sparse neural network for reram based accelerator,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019, pp. 639–644.
- A. Ankit, T. Ibrayev, A. Sengupta, and K. Roy, “Trannsformer: Clustered pruning on crossbar-based architectures for energy-efficient neural networks,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2361–2374, 2019.
- H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally, “Exploring the granularity of sparsity in convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 13–20.
- G. Yuan, P. Behnam, Y. Cai, A. Shafiee, J. Fu, Z. Liao, Z. Li, X. Ma, J. Deng, J. Wang et al., “Tinyadc: Peripheral circuit-aware weight pruning framework for mixed-signal dnn accelerators,” in 2021 DATE. IEEE, 2021, pp. 926–931.
- W. Xue, J. Bai, S. Sun, and W. Kang, “Hierarchical non-structured pruning for computing-in-memory accelerators with reduced adc resolution requirement,” in 2023 DATE. IEEE, 2023, pp. 1–6.
- H. Yang, W. Wen, and H. Li, “Deephoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures,” ICLR, 2020.
- B. Murmann, “Adc performance survey 1997-2020,” http://web.stanford.edu/~murmann/adcsurvey.html, 2020.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” University of Toronto, Toronto, ON, Canada, Tech. Rep., 2009.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, pp. 211–252, 2015.
- M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.