HCiM: ADC-Less Hybrid Analog-Digital Compute in Memory Accelerator for Deep Learning Workloads (2403.13577v1)
Abstract: Analog Compute-in-Memory (CiM) accelerators are increasingly recognized for their efficiency in accelerating Deep Neural Networks (DNN). However, their dependence on Analog-to-Digital Converters (ADCs) for accumulating partial sums from crossbars leads to substantial power and area overhead. Moreover, the high area overhead of ADCs constrains the throughput due to the limited number of ADCs that can be integrated per crossbar. An approach to mitigate this issue involves the adoption of extreme low-precision quantization (binary or ternary) for partial sums. Training based on such an approach eliminates the need for ADCs. While this strategy effectively reduces ADC costs, it introduces the challenge of managing numerous floating-point scale factors, which are trainable parameters like DNN weights. These scale factors must be multiplied with the binary or ternary outputs at the columns of the crossbar to ensure system accuracy. To that effect, we propose an algorithm-hardware co-design approach, where DNNs are first trained with quantization-aware training. Subsequently, we introduce HCiM, an ADC-Less Hybrid Analog-Digital CiM accelerator. HCiM uses analog CiM crossbars for performing Matrix-Vector Multiplication operations coupled with a digital CiM array dedicated to processing scale factors. This digital CiM array can execute both addition and subtraction operations within the memory array, thus enhancing processing speed. Additionally, it exploits the inherent sparsity in ternary quantization to achieve further energy savings. Compared to an analog CiM baseline architecture using 7 and 4-bit ADC, HCiM achieves energy reductions up to 28% and 12%, respectively
- psgd. [Online]. Available: https://github.com/aparna-aketi/d_psgd.
- Amogh Agrawal et al. Impulse: A 65-nm digital compute-in-memory macro with fused weights and membrane potential for spike-based sequential learning tasks. IEEE Solid-State Circuits Letters, 4:137–140, 2021.
- Mustafa Ali et al. A 65 nm 1.4-6.7 tops/w adaptive-snr sparsity-aware cim core with load balancing support for dl workloads. In 2023 CICC. IEEE, 2023.
- Aayush Ankit et al. Puma: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In ASPLOS, pages 715–731, 2019.
- Aayush Ankit et al. Circuits and architectures for in-memory computing-based machine learning accelerators. IEEE Micro, 40(6):8–22, 2020.
- Azat Azamat et al. Quarry: Quantization-based adc reduction for reram-based deep neural network accelerators. In 2021 IEEE/ACM ICCAD, pages 1–7. IEEE, 2021.
- Harijot Singh Bindra et al. A 1.2-v dynamic bias latch-type comparator in 65-nm cmos with 0.4-mv input noise. IEEE JSCC, 53(7):1902–1912, 2018.
- Chi-Hang Chan et al. A 3.8 mw 8b 1gs/s 2b/cycle interleaving sar adc with compact dac structure. In 2012 Symposium on VLSIC, pages 86–87. IEEE, 2012.
- Chi-Hang Chan et al. 26.5 a 5.5 mw 6b 5gs/s 4×\times×-lnterleaved 3b/cycle sar adc in 65nm cmos. In 2015 IEEE ISSCC Digest of Technical Papers. IEEE, 2015.
- Peiyu Chen et al. Rimac: An array-level adc/dac-free reram-based in-memory dnn processor with analog cache and computation. In ASPDAC, 2023.
- Hayun Chung et al. A 7.5-gs/s 3.8-enob 52-mw flash adc with clock duty cycle control in 65nm cmos. In 2009 Symposium on VLSIC, pages 268–269. IEEE, 2009.
- Jeffrey Dean. A golden decade of deep learning: Computing systems & applications. Daedalus, 151(2):58–74, 2022.
- Narendra Dhakad et al. R-inmac: 10t sram based reconfigurable and efficient in-memory advance computation for edge devices. AICASP, 2023.
- Steven K Esser et al. Learned step size quantization. arXiv preprint arXiv:1902.08153, 2019.
- Ai and memory wall. RiseLab Medium Post, 1:6, 2021.
- Kaiming He et al. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Yunxiang Hu et al. A survey on convolutional neural network accelerators: Gpu, fpga and asic. In 2022 14th ICCRD, pages 100–107. IEEE, 2022.
- H Kim et al. Algorithm/hardware co-design for in-memory neural network computing with minimal peripheral circuit overhead. In 57th DAC, pages 1–6. IEEE, 2020.
- Sangyeob Kim et al. Neuro-cim: Adc-less neuromorphic computing-in-memory processor with operation gating/stopping and digital–analog networks. IEEE JSCC, 2023.
- Yulhwa Kim et al. Extreme partial-sum quantization for analog computing-in-memory neural network accelerators. ACM JETC, 18(4):1–19, 2022.
- Yi Li et al. An adc-less rram-based computing-in-memory macro with binary cnn for efficient edge ai. IEEE TCAS II: Express Briefs, 2023.
- Boris Murmann. ADC Performance Survey 1997-2023. [Online]. Available: https://github.com/bmurmann/ADC-survey.
- Shubham Negi et al. Nax: neural architecture and memristive xbar based accelerator co-design. In Proceedings of the 59th ACM/IEEE DAC, pages 451–456, 2022.
- Xiaochen Peng et al. Dnn+ neurosim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In 2019 IEEE IEDM, pages 32–5. IEEE, 2019.
- Utkarsh Saxena et al. Partial-sum quantization for near adc-less compute-in-memory accelerators. In 2023 IEEE/ACM ISLPED, pages 1–6. IEEE, 2023.
- Aaron Stillmaker et al. Scaling equations for the accurate prediction of cmos device performance from 180 nm to 7 nm. Integration, 58:74–81, 2017.
- Shubham Negi (8 papers)
- Utkarsh Saxena (7 papers)
- Deepika Sharma (6 papers)
- Kaushik Roy (265 papers)