AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC (2402.13798v1)
Abstract: Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propose an analog-domain floating-point CIM architecture (AFPR-CIM) based on resistive random-access memory (RRAM). A novel adaptive dynamic-range FP-ADC is designed to convert the analog computation results into FP codes. Output current with high dynamic range is converted to a normalized voltage range for readout, to prevent precision loss at low power consumption. Moreover, a novel FP-DAC is also implemented which reconstructs FP digital codes into analog values to perform analog computation. The proposed AFPR-CIM architecture enables neural network acceleration with FP8 (E2M5) activation for better accuracy and energy efficiency. Evaluation results show that AFPR-CIM can achieve 19.89 TFLOPS/W energy efficiency and 1474.56 GOPS throughput. Compared to traditional FP8 accelerator, digital FP-CIM, and analog INT8-CIM, this work achieves 4.135x, 5.376x, and 2.841x energy efficiency enhancement, respectively.
- H. Ren, Y. Zhou, H. Fu, Y. Huang, R. Xu, and B. Cheng, “Ttpoint: A tensorized point cloud network for lightweight action recognition with event cameras,” arXiv preprint arXiv:2308.09993, 2023.
- S. Wang and P. Kanwar, “Bfloat16: The secret to high performance on cloud tpus,” Google Cloud Blog, vol. 4, 2019.
- J. Park, S. Lee, and D. Jeon, “9.3 a 40nm 4.81 tflops/w 8b floating-point training processor for non-sparse neural networks using shared exponent bias and 24-way fused multiply-add tree,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64. IEEE, 2021, pp. 1–3.
- van Baalen et al., “Fp8 versus int8 for efficient deep learning inference,” arXiv preprint arXiv:2303.17951, 2023.
- A. Kuzmin et al., “Fp8 quantization: The power of the exponent,” Advances in Neural Information Processing Systems, vol. 35, pp. 14 651–14 662, 2022.
- A. Shafiee et al., “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26, 2016.
- P. Chi et al., “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
- S. Yu, “Neuro-inspired computing with emerging nonvolatile memorys,” Proceedings of the IEEE, vol. 106, no. 2, pp. 260–285, 2018.
- L. Ni et al., “An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary rram crossbar,” in 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2016, pp. 280–285.
- L. Ni, Z. Liu, H. Yu, and R. V. Joshi, “An energy-efficient digital reram-crossbar-based cnn with bitwise parallelism,” IEEE Journal on Exploratory solid-state computational devices and circuits, vol. 3, pp. 37–46, 2017.
- W. Wan et al., “A compute-in-memory chip based on resistive random-access memory,” Nature, vol. 608, no. 7923, pp. 504–512, 2022.
- Q. Liu et al., “33.2 a fully integrated analog reram based 78.4 tops/w compute-in-memory chip with fully parallel mac computing,” in 2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2020, pp. 500–502.
- S. Zhang et al., “A robust 8-bit non-volatile computing-in-memory core for low-power parallel mac operations,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 6, pp. 1867–1880, 2020.
- F. Tu et al., “A 28nm 29.2 tflops/w bf16 and 36.5 tops/w int8 reconfigurable digital cim processor with unified fp/int pipeline and bitwise in-memory booth multiplication for cloud deep learning acceleration,” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65. IEEE, 2022, pp. 1–3.
- Z. Lu, M. T. Arafin, and G. Qu, “Rime: A scalable and energy-efficient processing-in-memory architecture for floating-point operations,” in Proceedings of the 26th Asia and South Pacific Design Automation Conference, 2021, pp. 120–125.
- P. Chen et al., “7.8 a 22nm delta-sigma computing-in-memory (δ𝛿\deltaitalic_δσ𝜎\sigmaitalic_σ cim) sram macro with near-zero-mean outputs and lsb-first adcs achieving 21.38 tops/w for 8b-mac edge ai processing,” in 2023 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2023, pp. 140–142.
- J. Lee et al., “A 13.7 tflops/w floating-point dnn processor using heterogeneous computing architecture with exponent-computing-in-memory,” in 2021 Symposium on VLSI Circuits. IEEE, 2021, pp. 1–2.
- Haobo Liu (1 paper)
- Zhengyang Qian (1 paper)
- Wei Wu (482 papers)
- Hongwei Ren (19 papers)
- Zhiwei Liu (114 papers)
- Leibin Ni (1 paper)