Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PICO-RAM: A PVT-Insensitive Analog Compute-In-Memory SRAM Macro with In-Situ Multi-Bit Charge Computing and 6T Thin-Cell-Compatible Layout (2407.12829v1)

Published 3 Jul 2024 in cs.AR and cs.ET

Abstract: Analog compute-in-memory (CIM) in static random-access memory (SRAM) is promising for accelerating deep learning inference by circumventing the memory wall and exploiting ultra-efficient analog low-precision arithmetic. Latest analog CIM designs attempt bit-parallel schemes for multi-bit analog Matrix-Vector Multiplication (MVM), aiming at higher energy efficiency, throughput, and training simplicity and robustness over conventional bit-serial methods that digitally shift-and-add multiple partial analog computing results. However, bit-parallel operations require more complex analog computations and become more sensitive to well-known analog CIM challenges, including large cell areas, inefficient and inaccurate multi-bit analog operations, and vulnerability to PVT variations. This paper presents PICO-RAM, a PVT-insensitive and compact CIM SRAM macro with charge-domain bit-parallel computation. It adopts a multi-bit thin-cell Multiply-Accumulate (MAC) unit that shares the same transistor layout as the most compact 6T SRAM cell. All analog computing modules, including digital-to-analog converters (DACs), MAC units, analog shift-and-add, and analog-to-digital converters (ADCs) reuse one set of local capacitors inside the array, performing in-situ computation to save area and enhance accuracy. A compact 8.5-bit dual-threshold time-domain ADC power gates the main path most of the time, leading to a significant energy reduction. Our 65-nm prototype achieves the highest weight storage density of 559 Kb/mm${2}$ and exceptional robustness to temperature and voltage variations (-40 to 105 ${\circ}$C and 0.65 to 1.2 V) among SRAM-based analog CIM designs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. M. Horowitz, “Computing’s Energy Problem (and What We Can Do about It),” in IEEE International Solid-State Circuits Conference (ISSCC), pp. 10–14, 2014.
  2. N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay, L.-Y. Chen, B. Zhang, and P. Deaville, “In-Memory Computing: Advances and Prospects,” IEEE Solid-State Circuits Magazine, vol. 11, no. 3, pp. 43–55, 2019.
  3. D. Wang, C.-T. Lin, G. K. Chen, P. Knag, R. K. Krishnamurthy, and M. Seok, “DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware,” in IEEE International Solid-State Circuits Conference (ISSCC), 2022.
  4. H. Fujiwara, H. Mori, W.-C. Zhao, M.-C. Chuang, R. Naous, C.-K. Chuang, T. Hashizume, D. Sun, C.-F. Lee, K. Akarvardar, S. Adham, T.-L. Chou, M. E. Sinangil, Y. Wang, Y.-D. Chih, Y.-H. Chen, H.-J. Liao, and T.-Y. J. Chang, “A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computing-in-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations,” in IEEE International Solid- State Circuits Conference (ISSCC), 2022.
  5. C.-F. Lee, C.-H. Lu, C.-E. Lee, H. Mori, H. Fujiwara, Y.-C. Shih, T.-L. Chou, Y.-D. Chih, and T.-Y. J. Chang, “A 12nm 121-TOPS/W 41.6-TOPS/mm2 All Digital Full Precision SRAM-based Compute-in-Memory with Configurable Bit-width For AI Edge Applications,” in IEEE Symposium on VLSI Technology and Circuits (VLSI), pp. 24–25, 2022.
  6. J. Lee, J. Kim, W. Jo, S. Kim, S. Kim, and H.-J. Yoo, “ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor,” IEEE Micro, vol. 42, no. 1, pp. 99–107, 2022.
  7. P.-C. Wu, J.-W. Su, L.-Y. Hong, J.-S. Ren, C.-H. Chien, H.-Y. Chen, C.-E. Ke, H.-M. Hsiao, S.-H. Li, S.-S. Sheu, W.-C. Lo, S.-C. Chang, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “A 22nm 832Kb Hybrid-Domain Floating-Point SRAM In-Memory-Compute Macro with 16.2-70.2TFLOPS/W for High-Accuracy AI-Edge Devices,” in IEEE International Solid-State Circuits Conference (ISSCC), pp. 126–128, 2023.
  8. G. Jedhe, C. Deshpande, S. Kumar, C.-X. Xue, Z. Guo, R. Garg, K. S. Jway, E.-J. Chang, J. Liang, Z. Wan, and Z. Pan, “A 12nm 137 TOPS/W Digital Compute-In-Memory using Foundry 8T SRAM Bitcell supporting 16 Kernel Weight Sets for AI Edge Applications,” in IEEE Symposium on VLSI Technology and Circuits, 2023.
  9. A. Guo, X. Si, X. Chen, F. Dong, X. Pu, D. Li, Y. Zhou, L. Ren, Y. Xue, X. Dong, H. Gao, Y. Zhang, J. Zhang, Y. Kong, T. Xiong, B. Wang, H. Cai, W. Shan, and J. Yang, “A 28nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs,” in 2023 IEEE International Solid-State Circuits Conference (ISSCC), pp. 128–130, 2023.
  10. Y.-D. Chih, P.-H. Lee, H. Fujiwara, Y.-C. Shih, C.-F. Lee, R. Naous, Y.-L. Chen, C.-P. Lo, C.-H. Lu, H. Mori, W.-C. Zhao, D. Sun, M. E. Sinangil, Y.-H. Chen, T.-L. Chou, K. Akarvardar, H.-J. Liao, Y. Wang, M.-F. Chang, and T.-Y. J. Chang, “An 89 TOPS/W and 16.3TOPS/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications,” in 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 252–254, 2021.
  11. H. Mori, W.-C. Zhao, C.-E. Lee, C.-F. Lee, Y.-H. Hsu, C.-K. Chuang, T. Hashizume, H.-C. Tung, Y.-Y. Liu, S.-R. Wu, K. Akarvardar, T.-L. Chou, H. Fujiwara, Y. Wang, Y.-D. Chih, Y.-H. Chen, H.-J. Liao, and T.-Y. J. Chang, “A 4nm 6163-TOPS/W/b 4790-TOPS/mm2/b SRAM Based Digital-Computing-in-Memory Macro Supporting Bit-Width Flexibility and Simultaneous MAC and Weight Update,” in IEEE International Solid-State Circuits Conference (ISSCC), pp. 132–134, 2023.
  12. H. Kim, T. Yoo, T. T.-H. Kim, and B. Kim, “Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks,” IEEE Journal of Solid-State Circuits, vol. 56, no. 7, pp. 2221–2233, 2021.
  13. Y. He, H. Diao, C. Tang, W. Jia, X. Tang, Y. Wang, J. Yue, X. Li, H. Yang, H. Jia, and Y. Liu, “A 28nm 38-to-102-TOPS/W 8b Multiply-Less Approximate Digital SRAM Compute-In-Memory Macro for Neural-Network Inference,” in IEEE International Solid-State Circuits Conference (ISSCC), pp. 130–132, 2023.
  14. B. Yan, J.-L. Hsu, P.-C. Yu, C.-C. Lee, Y. Zhang, W. Yue, G. Mei, Y. Yang, Y. Yang, H. Li, Y. Chen, and R. Huang, “A 1.041-Mb/MM 2 27.38-TOPS/W signed-INT8 dynamic-logic-based ADC-less SRAM compute-in-memory macro in 28nm with reconfigurable bitwise operation for AI and embedded applications,” in IEEE International Solid-State Circuits Conference (ISSCC), 2022.
  15. B. Murmann, “Mixed-Signal Computing for Deep Neural Network Inference,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 1, pp. 3–13, 2021.
  16. Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W.-S. Khwa, H.-J. Liao, Y. Wang, and J. Chang, “A 351TOPS/W and 372.4gops compute-in-memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications,” in IEEE International Solid- State Circuits Conference (ISSCC), pp. 242–244, 2020.
  17. S. K. Gonugondla, M. Kang, and N. Shanbhag, “A 42pJ/Decision 3.12TOPS/W Robust in-Memory Machine Learning Classifier with on-Chip Training,” in IEEE International Solid-State Circuits Conference (ISSCC), pp. 490–492, 2018.
  18. S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks,” IEEE Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, 2020.
  19. S. Okumura, M. Yabuuchi, K. Hijioka, and K. Nose, “A Ternary Based Bit Scalable, 8.80 TOPS/W CNN Accelerator with Many-core Processing-in-memory Architecture with 896K Synapses/mm2,” in IEEE Symposium on VLSI Technology and Circuits, pp. C248–C249, 2019.
  20. X. Si, Y.-N. Tu, W.-H. Huang, J.-W. Su, P.-J. Lu, J.-H. Wang, T.-W. Liu, S.-Y. Wu, R. Liu, Y.-C. Chou, Z. Zhang, S.-H. Sie, W.-C. Wei, Y.-C. Lo, T.-H. Wen, T.-H. Hsu, Y.-K. Chen, W. Shih, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, N.-C. Lien, W.-C. Shih, Y. He, Q. Li, and M.-F. Chang, “A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips,” in IEEE International Solid- State Circuits Conference (ISSCC), pp. 246–248, 2020.
  21. X. Si, J.-J. Chen, Y.-N. Tu, W.-H. Huang, J.-H. Wang, Y.-C. Chiu, W.-C. Wei, S.-Y. Wu, X. Sun, R. Liu, S. Yu, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, Q. Li, and M.-F. Chang, “A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors,” IEEE Journal of Solid-State Circuits, vol. 55, no. 1, pp. 189–202, 2020.
  22. J. Zhang, Z. Wang, and N. Verma, “In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array,” IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915–924, 2017.
  23. A. Biswas and A. P. Chandrakasan, “Conv-sram: An energy-efficient sram with in-memory dot-product computation for low-power convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 217–230, 2019.
  24. Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism,” IEEE Journal of Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897, 2020.
  25. H. Jia, H. Valavi, Y. Tang, J. Zhang, and N. Verma, “A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing,” IEEE Journal of Solid-State Circuits, pp. 2609–2621, 2020.
  26. H. Jia, M. Ozatay, Y. Tang, H. Valavi, R. Pathak, J. Lee, and N. Verma, “Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing,” IEEE Journal of Solid-State Circuits, pp. 198–211, 2021.
  27. Z. Chen, Q. Jin, Z. Yu, Y. Wang, and K. Yang, “DCT-RAM: A Driver-Free Process-In-Memory 8T SRAM Macro with Multi-Bit Charge-Domain Computation and Time-Domain Quantization,” in IEEE Custom Integrated Circuits Conference (CICC), 2022.
  28. Z. Chen, Z. Yu, Q. Jin, Y. He, J. Wang, S. Lin, D. Li, Y. Wang, and K. Yang, “CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference,” IEEE Journal of Solid-State Circuits, vol. 56, no. 6, pp. 1924–1935, 2021.
  29. E. Lee, T. Han, D. Seo, G. Shin, J. Kim, S. Kim, S. Jeong, J. Rhe, J. Park, J. H. Ko, and Y. Lee, “A Charge-Domain Scalable-Weight In-Memory Computing Macro With Dual-SRAM Architecture for Precision-Scalable DNN Accelerators,” IEEE Transactions on Circuits and Systems I: Regular Paper, vol. 68, no. 8, pp. 3305–3316, 2021.
  30. H. Wang, R. Liu, R. Dorrance, D. Dasalukunte, D. Lake, and B. Carlton, “A Charge Domain SRAM Compute-in-Memory Macro With C-2C Ladder-Based 8-Bit MAC Unit in 22-nm FinFET Process for Edge Inference,” IEEE Journal of Solid-State Circuits, vol. 58, no. 4, pp. 1037–1050, 2023.
  31. S.-E. Hsieh, C.-H. Wei, C.-X. Xue, H.-W. Lin, W.-H. Tu, E.-J. Chang, K.-T. Yang, P.-H. Chen, W.-N. Liao, L. L. Low, C.-D. Lee, A.-C. Lu, J. Liang, C.-C. Cheng, and T.-H. Kang, “A 70.85-86.27TOPS/W PVT-Insensitive 8b Word-Wise ACIM with Post-Processing Relaxation,” in IEEE International Solid-State Circuits Conference (ISSCC), pp. 136–138, 2023.
  32. J.-W. Su, Y.-C. Chou, R. Liu, T.-W. Liu, P.-J. Lu, P.-C. Wu, Y.-L. Chung, L.-Y. Hung, J.-S. Ren, T. Pan, S.-H. Li, S.-C. Chang, S.-S. Sheu, W.-C. Lo, C.-I. Wu, X. Si, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips,” in IEEE International Solid- State Circuits Conference (ISSCC), pp. 250–252, 2021.
  33. Y.-T. Hsu, C.-Y. Yao, T.-Y. Wu, T.-D. Chiueh, and T.-T. Liu, “A High-Throughput Energy–Area-Efficient Computing-in-Memory SRAM Using Unified Charge-Processing Network,” IEEE Solid-State Circuits Letters, vol. 4, pp. 146–149, 2021.
  34. H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute,” IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1789–1799, 2019.
  35. P. Chen, M. Wu, W. Zhao, J. Cui, Z. Wang, Y. Zhang, Q. Wang, J. Ru, L. Shen, T. Jia, Y. Ma, L. Ye, and R. Huang, “A 22nm Delta-Sigma Computing-In-Memory (Δ⁢ΣΔΣ\Delta\Sigmaroman_Δ roman_ΣCIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing,” in IEEE International Solid-State Circuits Conference (ISSCC), 2023.
  36. J.-W. Su, Y.-C. Chou, R. Liu, T.-W. Liu, P.-J. Lu, P.-C. Wu, Y.-L. Chung, L.-Y. Hong, J.-S. Ren, T. Pan, C.-J. Jhang, W.-H. Huang, C.-H. Chien, P.-I. Mei, S.-H. Li, S.-S. Sheu, S.-C. Chang, W.-C. Lo, C.-I. Wu, X. Si, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “A 8-b-Precision 6T SRAM Computing-in-Memory Macro Using Segmented-Bitline Charge-Sharing Scheme for AI Edge Chips,” IEEE Journal of Solid-State Circuits, vol. 58, no. 3, pp. 877–892, 2023.
  37. Q. Jin, Z. Chen, J. Ren, Y. Li, Y. Wang, and K. Yang, “PIM-QAT: Neural network quantization for processing-in-memory (PIM) systems,” arXiv preprint arXiv:2209.08617, 2022.
  38. J. Lee, H. Valavi, Y. Tang, and N. Verma, “Fully Row/Column-Parallel In-memory Computing SRAM Macro employing Capacitor-based Mixed-signal Computation with 5-b Inputs,” in IEEE Symposium on VLSI Technology and Circuits, 2021.
  39. D. Lin, S. Talathi, and S. Annapureddy, “Fixed Point Quantization of Deep Convolutional Networks,” in International Conference on Machine Learning (ICML), pp. 2849–2858, 2016.
  40. S. K. Gonugondla, C. Sakr, H. Dbouk, and N. R. Shanbhag, “Fundamental Limits on the Precision of In-Memory Architectures,” in Proceedings of the 39th International Conference on Computer-Aided Design, pp. 1–9, 2020.
  41. R. H. Walden, “Analog-to-digital converter survey and analysis,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 4, pp. 539–550, 1999.
  42. W.-C. Wei, C.-J. Jhang, Y.-R. Chen, C.-X. Xue, S.-H. Sie, J.-L. Lee, H.-W. Kuo, C.-C. Lu, M.-F. Chang, and K.-T. Tang, “A Relaxed Quantization Training Method for Hardware Limitations of Resistive Random Access Memory (ReRAM)-Based Computing-in-Memory,” IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 6, no. 1, pp. 45–52, 2020.
  43. B. Zhang, C.-Y. Chen, and N. Verma, “Reshape and Adapt for Output Quantization (RAOQ): Quantization-aware Training for In-memory Computing Systems,” https://openreview.net/pdf?id=r5sikTJ94y, 2023.
  44. S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, “DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients,” arXiv preprint arXiv:1606.06160, 2016.
  45. J. Kim, K. Lee, and J. Park, “A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing,” in Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, 2022.
  46. B. Wang et al., “A 28nm Horizontal-Weight-Shift and Vertical-feature-Shift-Based Separate-WL 6T-SRAM Computation-in-Memory Unit-Macro for Edge Depthwise Neural-Networks,” in IEEE International Solid-State Circuits Conference (ISSCC), pp. 134–136, 2023.
  47. A. Coucke, M. Chlieh, T. Gisselbrecht, D. Leroy, M. Poumeyrol, and T. Lavril, “Efficient Keyword Spotting Using Dilated Convolutions and Gating,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6351–6355, 2019.
  48. J. Yue et al., “An Energy-Efficient Computing-in-Memory NN Processor With Set-Associate Blockwise Sparsity and Ping-Pong Weight Update,” IEEE Journal of Solid-State Circuits, 2023.
  49. J. Yue, Y. Liu, Z. Yuan, X. Feng, Y. He, W. Sun, Z. Zhang, X. Si, R. Liu, Z. Wang, M.-F. Chang, C. Dou, X. Li, M. Liu, and H. Yang, “STICKER-IM: A 65 nm Computing-in-Memory NN Processor Using Block-Wise Sparsity Optimization and Inter/Intra-Macro Data Reuse,” IEEE Journal of Solid-State Circuits, vol. 57, no. 8, pp. 2560–2573, 2022.
  50. R. Sehgal, R. Mehra, C. Ni, and J. P. Kulkarni, “Compute-MLROM: Compute-in-Multi Level Read Only Memory for Energy Efficient Edge AI Inference Engines,” in European Solid State Circuits Conference (ESSCIRC), pp. 37–40, 2023.
  51. Y. Wang, S. Xie, J. Rohan, M. Wang, M. Yang, S. Oruganti, and J. P. Kulkarni, “A GNN Computing-in-Memory Macro and Accelerator with Analog-Digital Hybrid Transformation and CAMenabled Search-reduce,” in Custom Integrated Circuits Conference (CICC), 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zhiyu Chen (60 papers)
  2. Ziyuan Wen (2 papers)
  3. Weier Wan (2 papers)
  4. Akhil Reddy Pakala (1 paper)
  5. Yiwei Zou (4 papers)
  6. Wei-Chen Wei (1 paper)
  7. Zengyi Li (7 papers)
  8. Yubei Chen (32 papers)
  9. Kaiyuan Yang (32 papers)
Citations (2)