Improving the Accuracy of Analog-Based In-Memory Computing Accelerators Post-Training (2401.09859v1)
Abstract: Analog-Based In-Memory Computing (AIMC) inference accelerators can be used to efficiently execute Deep Neural Network (DNN) inference workloads. However, to mitigate accuracy losses, due to circuit and device non-idealities, Hardware-Aware (HWA) training methodologies must be employed. These typically require significant information about the underlying hardware. In this paper, we propose two Post-Training (PT) optimization methods to improve accuracy after training is performed. For each crossbar, the first optimizes the conductance range of each column, and the second optimizes the input, i.e, Digital-to-Analog Converter (DAC), range. It is demonstrated that, when these methods are employed, the complexity during training, and the amount of information about the underlying hardware can be reduced, with no notable change in accuracy ($\leq$0.1%) when finetuning the pretrained RoBERTa transformer model for all General Language Understanding Evaluation (GLUE) benchmark tasks. Additionally, it is demonstrated that further optimizing learned parameters PT improves accuracy.
- R. Chellappa, S. Theodoridis, and A. van Schaik, “Advances in Machine Learning and Deep Neural Networks,” Proceedings of the IEEE, vol. 109, no. 5, pp. 607–611, 2021.
- X. Xu, Y. Ding, S. X. Hu, M. Niemier, J. Cong, Y. Hu, and Y. Shi, “Scaling for edge inference of deep neural networks,” Nature Electronics, vol. 1, no. 4, p. 216, 2018.
- P. P. Ray, “A review on TinyML: State-of-the-art and prospects,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 4, pp. 1595–1623, 2022.
- M. R. Azghadi, C. Lammie, J. K. Eshraghian, M. Payvand, E. Donati, B. Linares-Barranco, and G. Indiveri, “Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications,” IEEE Transactions on Biomedical Circuits and Systems, vol. 14, no. 6, pp. 1138–1159, Dec. 2020.
- M. Le Gallo, R. Khaddam-Aljameh, M. Stanisavljevic, A. Vasilopoulos, B. Kersting, M. Dazzi, G. Karunaratne, M. Brändli, A. Singh, S. M. Müller, J. Büchel, X. Timoneda, V. Joshi, M. J. Rasch, U. Egger, A. Garofalo, A. Petropoulos, T. Antonakopoulos, K. Brew, S. Choi, I. Ok, T. Philip, V. Chan, C. Silvestre, I. Ahsan, N. Saulnier, V. Narayanan, P. A. Francese, E. Eleftheriou, and A. Sebastian, “A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference,” Nature Electronics, vol. 6, no. 9, pp. 680–693, Sep. 2023.
- C. Lammie, O. Krestinskaya, A. James, and M. R. Azghadi, “Variation-Aware Binarized Memristive Networks,” in 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Genoa, Italy, Nov. 2019, pp. 490–493.
- A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, and E. Eleftheriou, “Memory devices and applications for in-memory computing,” Nature Nanotechnology, vol. 15, no. 7, pp. 529–544, Jul. 2020.
- M.-K. Song, J.-H. Kang, X. Zhang, W. Ji, A. Ascoli, I. Messaris, A. S. Demirkol, B. Dong, S. Aggarwal, W. Wan, S.-M. Hong, S. G. Cardwell, I. Boybat, J.-s. Seo, J.-S. Lee, M. Lanza, H. Yeon, M. Onen, J. Li, B. Yildiz, J. A. del Alamo, S. Kim, S. Choi, G. Milano, C. Ricciardi, L. Alff, Y. Chai, Z. Wang, H. Bhaskaran, M. C. Hersam, D. Strukov, H.-S. P. Wong, I. Valov, B. Gao, H. Wu, R. Tetzlaff, A. Sebastian, W. Lu, L. Chua, J. J. Yang, and J. Kim, “Recent Advances and Future Prospects for Memristive Materials, Devices, and Systems,” ACS Nano, Jun. 2023.
- G. Molas, G. Sassine, C. Nail, D. A. Robayo, J.-F. Nodin, C. Cagli, J. Coignus, P. Blaise, and E. Nowak, “(Invited) Resistive Memories (RRAM) Variability: Challenges and Solutions,” ECS Transactions, vol. 86, no. 3, p. 35, Jul. 2018.
- X. Yang, B. Taylor, A. Wu, Y. Chen, and L. O. Chua, “Research Progress on Memristor: From Synapses to Computing Systems,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 5, pp. 1845–1857, May 2022.
- J. Büchel, A. Vasilopoulos, B. Kersting, F. Odermatt, K. Brew, I. Ok, S. Choi, I. Saraf, V. Chan, T. Philip, N. Saulnier, V. Narayanan, M. L. Gallo, and A. Sebastian, “Gradient descent-based programming of analog in-memory computing cores,” in 2022 International Electron Devices Meeting (IEDM), Dec. 2022, pp. 33.1.1–33.1.4.
- J. Büchel, A. Vasilopoulos, B. Kersting, C. Lammie, K. Brew, T. Philip, N. Saulnier, V. Narayanan, M. Le Gallo, and A. Sebastian, “Programming weights to analog in-memory computing cores by direct minimization of the matrix-vector multiplication error,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 13, no. 4, pp. 1052–1061, 2023.
- A. Vasilopoulos, J. Büchel, B. Kersting, C. Lammie, K. Brew, S. Choi, T. Philip, N. Saulnier, V. Narayanan, M. L. Gallo, and A. Sebastian, “Exploiting the state dependency of conductance variations in memristive devices for accurate in-memory computing,” IEEE Transactions on Electron Devices, vol. 70, no. 12, pp. 6279–6285, 2023.
- N. Li, C. Mackin, A. Chen, K. Brew, T. Philip, A. Simon, I. Saraf, J.-P. Han, S. G. Sarwat, G. W. Burr, M. Rasch, A. Sebastian, V. Narayanan, and N. Saulnier, “Optimization of Projected Phase Change Memory for Analog In-Memory Computing Inference,” Advanced Electronic Materials, vol. 9, no. 6, p. 2201190, 2023.
- E. Park, S. Yoo, and P. Vajda, “Value-aware Quantization for Training and Inference of Neural Networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 580–595.
- M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. R. Nandakumar, P. Narayanan, H. Tsai, G. W. Burr, A. Sebastian, and V. Narayanan, “Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators,” Nature Communications, vol. 14, no. 1, p. 5282, Aug. 2023.
- M. Nagel, R. A. Amjad, M. V. Baalen, C. Louizos, and T. Blankevoort, “Up or Down? Adaptive Rounding for Post-Training Quantization,” in Proceedings of the 37th International Conference on Machine Learning. PMLR, Nov. 2020, pp. 7197–7206.
- G. Xiao, J. Lin, M. Seznec, H. Wu, J. Demouth, and S. Han, “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models,” in Proceedings of the 40th International Conference on Machine Learning. PMLR, Jul. 2023, pp. 38 087–38 099.
- M. J. Rasch, D. Moreda, T. Gokmen, M. Le Gallo, F. Carta, C. Goldberg, K. El Maghraoui, A. Sebastian, and V. Narayanan, “A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays,” in 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), Jun. 2021.
- M. Le Gallo, C. Lammie, J. Büchel, F. Carta, O. Fagbohungbe, C. Mackin, H. Tsai, V. Narayanan, A. Sebastian, K. El Maghraoui, and M. J. Rasch, “Using the IBM analog in-memory hardware acceleration kit for neural network training and inference,” APL Machine Learning, vol. 1, no. 4, p. 041102, 11 2023.
- A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding,” Feb. 2019.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” May 2019.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” Jul. 2019.
- V. Joshi, M. Le Gallo, S. Haefeli, I. Boybat, S. R. Nandakumar, C. Piveteau, M. Dazzi, B. Rajendran, A. Sebastian, and E. Eleftheriou, “Accurate Deep Neural Network Inference Using Computational Phase-Change Memory,” Nature Communications, vol. 11, no. 1, p. 2473, May 2020.
- I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” Jan. 2019.