ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM (2402.14152v1)
Abstract: Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Processing-in-memory (PIM) is a promising approach to tackle this problem. In this work, we design ModSRAM, the first 8T SRAM PIM architecture to compute large-number modular multiplication efficiently. In addition, we propose R4CSA-LUT, a new algorithm that reduces the cycles for an interleaved algorithm and eliminates carry propagation for addition based on look-up tables (LUT). ModSRAM is co-designed with R4CSA-LUT to support modular multiplication and data reuse in memory with 52% cycle reduction compared to prior works with only 32% area overhead.
- Amogh Agrawal et al. 2018. X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 12 (2018), 4219–4232.
- Paul Barrett. 1987. Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor. In Advances in Cryptology — CRYPTO’ 86, Andrew M. Odlyzko (Ed.).
- G.R. Blakely. 1983. A Computer Algorithm for Calculating the Product AB Modulo M. IEEE Trans. Comput. C-32, 5 (1983), 497–500.
- ANDREW D. BOOTH. 1951. A SIGNED BINARY MULTIPLICATION TECHNIQUE. The Quarterly Journal of Mechanics and Applied Mathematics (1951).
- Lily Chen et al. 2023. Digital Signature Standard (DSS). https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=935202
- Shafi Goldwasser et al. 1989. The Knowledge Complexity of Interactive Proof Systems. SIAM J. Comput. 18, 1 (1989), 186–208. https://doi.org/10.1137/0218012
- Khalid Javeed and Xiaojun Wang. 2014. Radix-4 and radix-8 booth encoded interleaved modular multipliers over general Fp. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1–6.
- Neal Koblitz. 1987. Elliptic curve cryptosystems. Mathematics of computation 48, 177 (1987), 203–209.
- Kyeongho Lee et al. 2020. Bit Parallel 6T SRAM In-memory Computing with Reconfigurable Bit-Precision. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6.
- Dai Li et al. 2022. MeNTT: A Compact and Efficient Processing-in-Memory Number Theoretic Transform (NTT) Accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 30, 5 (2022), 579–588.
- Mengyuan Li et al. 2023. Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory. arXiv preprint arXiv:2307.14557 (2023).
- Oleg Mazonka et al. 2022. Fast and Compact Interleaved Modular Multiplication Based on Carry Save Addition. In Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’22).
- A. C. Mert et al. 2020. Parametric-ntt. https://github.com/acmert/parametric-ntt.
- Ahmet Can Mert et al. 2022. An Extensive Study of Flexible Design Methods for the Number Theoretic Transform. IEEE Trans. Comput. 71, 11 (2022), 2829–2843.
- Peter L Montgomery. 1985. Modular multiplication without trial division. Mathematics of computation 44, 170 (1985), 519–521.
- Hamid Nejatollahi et al. 2020. CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6.
- Yongmo Park et al. 2022. RM-NTT: An RRAM-Based Compute-in-Memory Number Theoretic Transform Accelerator. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 8, 2 (2022), 93–101.
- J. M. Pollard. 1971. The fast Fourier transform in a finite field. Math. Comp. 25 (1971), 365–374. https://api.semanticscholar.org/CorpusID:123174851
- A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Commun. ACM (feb 1978).
- Amitesh Sridharan et al. 2022. A 1.23-GHz 16-kb Programmable and Generic Processing-in-SRAM Accelerator in 65nm. In ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC). 153–156.
- B. Wicht et al. 2004. Yield and speed optimization of a latch-type voltage sense amplifier. IEEE Journal of Solid-State Circuits 39, 7 (2004), 1148–1158.
- Jingyao Zhang et al. 2023. BP-NTT: Fast and Compact in-SRAM Number Theoretic Transform with Bit-Parallel Modular Multiplication. arXiv:2303.00173 [cs.AR]
- Yiqun Zhang et al. 2018. Recryptor: A Reconfigurable Cryptographic Cortex-M0 Processor With In-Memory and Near-Memory Computing for IoT Security. IEEE Journal of Solid-State Circuits 53, 4 (2018), 995–1005.
- Ye Zhang et al. 2021. PipeZK: Accelerating Zero-Knowledge Proof with a Pipelined Architecture. In 2021 ACM/IEEE 48th ISCA. 416–428.
- Jonathan Ku (2 papers)
- Junyao Zhang (10 papers)
- Haoxuan Shan (1 paper)
- Saichand Samudrala (1 paper)
- Jiawen Wu (9 papers)
- Qilin Zheng (5 papers)
- Ziru Li (3 papers)
- JV Rajendran (1 paper)
- Yiran Chen (176 papers)