Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators (2402.06164v2)

Published 9 Feb 2024 in cs.AR

Abstract: Deep neural networks are widely deployed in many fields. Due to the in-situ computation (known as processing in memory) capacity of the Resistive Random Access Memory (ReRAM) crossbar, ReRAM-based accelerator shows potential in accelerating DNN with low power and high performance. However, despite power advantage, such kind of accelerators suffer from the high power consumption of peripheral circuits, especially Analog-to-Digital Converter (ADC), which account for over 60 percent of total power consumption. This problem hinders the ReRAM-based accelerator to achieve higher efficiency. Some redundant Analog-to-Digital conversion operations have no contribution to maintaining inference accuracy, and such operations can be eliminated by modifying the ADC searching logic. Based on such observations, we propose an algorithm-hardware co-design method and explore the co-design approach in both hardware design and quantization algorithms. Firstly, we focus on the distribution output along the crossbar's bit-lines and identify the fine-grained redundant ADC sampling bits. % of weight and To further compress ADC bits, we propose a hardware-friendly quantization method and coding scheme, in which different quantization strategy was applied to the partial results in different intervals. To support the two features above, we propose a lightweight architectural design based on SAR-ADC\@. It's worth mentioning that our method is not only more energy efficient but also retains the flexibility of the algorithm. Experiments demonstrate that our method can reduce about $1.6 \sim 2.3 \times$ ADC power reduction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. M. Gobulukoglu, C. Drewes et al., “Classifying Computations on Multi-Tenant FPGAs,” in DAC, 2021, pp. 1261–1266.
  2. A. Boroumand, S. Ghose et al., “Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks,” PACT, vol. 2021-Septe, pp. 159–172, 2021.
  3. A. Shafiee, A. Nag et al., “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ISCA, vol. 44, no. 3, pp. 14–26, 2016.
  4. X. Li, Z. Yuan et al., “Tailor: Removing Redundancy in Memristive Analog Neural Network Accelerators,” in DAC.   IEEE, 2022.
  5. S. Qu, B. Li et al., “ASBP: Automatic Structured Bit-Pruning for RRAM-based NN Accelerator,” in DAC.   IEEE, 2021, pp. 745–750.
  6. Z. Zhu, H. Sun et al., “A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM,” in DAC.   IEEE, 2019, pp. 1–6.
  7. T. Chou, W. Tang et al., “CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm,” in Proc. MICRO, New York, NY, USA, 2019, pp. 114–125.
  8. P. Chi, S. Li et al., “Prime: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory,” ISCA, vol. 44, no. 3, pp. 27–39, 2016.
  9. H. Sun, Z. Zhu et al., “An energy-efficient quantized and regularized training framework for processing-in-memory accelerators,” in DAC.   IEEE, 2020, pp. 325–330.
  10. IRDS. (2021) INTERNATIONAL ROADMAP FOR DEVICES AND SYSTEMS, MORE THAN MOORE WHITE PAPER. [Online]. Available: https://irds.ieee.org/editions/2021/more-moore
  11. X. Ma, G. Yuan et al., “Tiny but accurate: A pruned, quantized and optimized memristor crossbar framework for ultra efficient dnn implementation,” in DAC.   IEEE, 2020, pp. 301–306.
  12. H. Shin, R. Park et al., “Effective zero compression on ReRAM-based sparse dnn accelerators,” in DAC, New York, NY, USA, 2022, p. 949–954.
  13. T. Andrulis, J. S. Emer, and V. Sze, “Raella: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!” in ISCA, 2023.
  14. Q. Liu, B. Gao et al., “33.2 A fully integrated analog ReRAM based 78.4 TOPS/W compute-in-memory chip with fully parallel MAC computing,” in ISSCC.   IEEE, 2020, pp. 500–502.
  15. B. Murmann, “Energy limits in current a/d converter architectures,” ISSCC Short Course, 2012.
  16. ——, “Energy limits in a/d converters,” in 2013 IEEE Faible Tension Faible Consommation, 2013, pp. 1–4.
  17. Y. Choukroun, E. Kravchik et al., “Low-bit Quantization of Neural Networks for Efficient Inference,” in ICCV Work. 2019, 2019, pp. 3009–3018.
  18. R. Banner, Y. Nahshan, and D. Soudry, “Post training 4-bit quantization of convolutional networks for rapid-deployment,” in NeurIPS ’2019, H. Wallach, H. Larochelle et al., Eds., vol. 32, 2019.
  19. P. Yao, H. Wu et al., “Fully hardware-implemented memristor convolutional neural network,” Nature, vol. 577, no. 7792, pp. 641–646, 2020.
  20. H. Chen, X. Zhot et al., “A> 3ghz erbw 1.1 gs/s 8b two-sten sar adc with recursive-weight DAC,” in 2018 IEEE Symp. VLSI Circuits.   IEEE, 2018, pp. 97–98.
  21. J. E. Stine, I. Castellanos et al., “FreePDK: An open-source variation-aware design kit,” in 2007 IEEE Int. Conf. Microelectron. Syst. Educ.   IEEE, 2007, pp. 173–174.
  22. X. Peng, S. Huang et al., “DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies,” in 2019 IEDM, 2019, pp. 32.5.1–32.5.4.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chenguang Zhang (11 papers)
  2. Zhihang Yuan (45 papers)
  3. Xingchen Li (33 papers)
  4. Guangyu Sun (47 papers)

Summary

We haven't generated a summary for this paper yet.