Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC (2402.13798v1)

Published 21 Feb 2024 in eess.SY and cs.SY

Abstract: Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propose an analog-domain floating-point CIM architecture (AFPR-CIM) based on resistive random-access memory (RRAM). A novel adaptive dynamic-range FP-ADC is designed to convert the analog computation results into FP codes. Output current with high dynamic range is converted to a normalized voltage range for readout, to prevent precision loss at low power consumption. Moreover, a novel FP-DAC is also implemented which reconstructs FP digital codes into analog values to perform analog computation. The proposed AFPR-CIM architecture enables neural network acceleration with FP8 (E2M5) activation for better accuracy and energy efficiency. Evaluation results show that AFPR-CIM can achieve 19.89 TFLOPS/W energy efficiency and 1474.56 GOPS throughput. Compared to traditional FP8 accelerator, digital FP-CIM, and analog INT8-CIM, this work achieves 4.135x, 5.376x, and 2.841x energy efficiency enhancement, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. H. Ren, Y. Zhou, H. Fu, Y. Huang, R. Xu, and B. Cheng, “Ttpoint: A tensorized point cloud network for lightweight action recognition with event cameras,” arXiv preprint arXiv:2308.09993, 2023.
  2. S. Wang and P. Kanwar, “Bfloat16: The secret to high performance on cloud tpus,” Google Cloud Blog, vol. 4, 2019.
  3. J. Park, S. Lee, and D. Jeon, “9.3 a 40nm 4.81 tflops/w 8b floating-point training processor for non-sparse neural networks using shared exponent bias and 24-way fused multiply-add tree,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64.   IEEE, 2021, pp. 1–3.
  4. van Baalen et al., “Fp8 versus int8 for efficient deep learning inference,” arXiv preprint arXiv:2303.17951, 2023.
  5. A. Kuzmin et al., “Fp8 quantization: The power of the exponent,” Advances in Neural Information Processing Systems, vol. 35, pp. 14 651–14 662, 2022.
  6. A. Shafiee et al., “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14–26, 2016.
  7. P. Chi et al., “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
  8. S. Yu, “Neuro-inspired computing with emerging nonvolatile memorys,” Proceedings of the IEEE, vol. 106, no. 2, pp. 260–285, 2018.
  9. L. Ni et al., “An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary rram crossbar,” in 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2016, pp. 280–285.
  10. L. Ni, Z. Liu, H. Yu, and R. V. Joshi, “An energy-efficient digital reram-crossbar-based cnn with bitwise parallelism,” IEEE Journal on Exploratory solid-state computational devices and circuits, vol. 3, pp. 37–46, 2017.
  11. W. Wan et al., “A compute-in-memory chip based on resistive random-access memory,” Nature, vol. 608, no. 7923, pp. 504–512, 2022.
  12. Q. Liu et al., “33.2 a fully integrated analog reram based 78.4 tops/w compute-in-memory chip with fully parallel mac computing,” in 2020 IEEE International Solid-State Circuits Conference-(ISSCC).   IEEE, 2020, pp. 500–502.
  13. S. Zhang et al., “A robust 8-bit non-volatile computing-in-memory core for low-power parallel mac operations,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 6, pp. 1867–1880, 2020.
  14. F. Tu et al., “A 28nm 29.2 tflops/w bf16 and 36.5 tops/w int8 reconfigurable digital cim processor with unified fp/int pipeline and bitwise in-memory booth multiplication for cloud deep learning acceleration,” in 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65.   IEEE, 2022, pp. 1–3.
  15. Z. Lu, M. T. Arafin, and G. Qu, “Rime: A scalable and energy-efficient processing-in-memory architecture for floating-point operations,” in Proceedings of the 26th Asia and South Pacific Design Automation Conference, 2021, pp. 120–125.
  16. P. Chen et al., “7.8 a 22nm delta-sigma computing-in-memory (δ𝛿\deltaitalic_δσ𝜎\sigmaitalic_σ cim) sram macro with near-zero-mean outputs and lsb-first adcs achieving 21.38 tops/w for 8b-mac edge ai processing,” in 2023 IEEE International Solid-State Circuits Conference (ISSCC).   IEEE, 2023, pp. 140–142.
  17. J. Lee et al., “A 13.7 tflops/w floating-point dnn processor using heterogeneous computing architecture with exponent-computing-in-memory,” in 2021 Symposium on VLSI Circuits.   IEEE, 2021, pp. 1–2.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Haobo Liu (1 paper)
  2. Zhengyang Qian (1 paper)
  3. Wei Wu (482 papers)
  4. Hongwei Ren (19 papers)
  5. Zhiwei Liu (114 papers)
  6. Leibin Ni (1 paper)
Citations (4)

Summary

We haven't generated a summary for this paper yet.