APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption (2404.15819v2)
Abstract: Fully Homomorphic Encryption (FHE) is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present \NAME, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of computational resources and memory bandwidth. The experimental results illustrate that APACHE outperforms state-of-the-art ASIC FHE accelerators by 10.63x to 35.47x over a variety of application benchmarks, e.g., Lola MNIST, HELR, VSP, and HE${3}$DB.
- “Lattigo v5,” Online: https://github.com/tuneinsight/lattigo, Nov. 2023, ePFL-LDS, Tune Insight SA.
- R. Agrawal, L. de Castro, G. Yang, C. Juvekar, R. T. Yazicigil, A. P. Chandrakasan, V. Vaikuntanathan, and A. Joshi, “FAB: an fpga-based accelerator for bootstrappable fully homomorphic encryption,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 882–895.
- S. Akleylek, Ö. Dagdelen, and Z. Y. Tok, “On the efficiency of polynomial multiplication for lattice-based cryptography on gpus using CUDA,” in Cryptography and Information Security in the Balkans - Second International Conference, vol. 9540, 2015, pp. 155–168.
- A. A. Badawi, B. Veeravalli, C. F. Mun, and K. M. M. Aung, “High-performance FV somewhat homomorphic encryption on gpus: An implementation using CUDA,” IACR Transactions on Cryptographic Hardware and Embedded Systems, no. 2, pp. 70–95, 2018.
- R. Banno, K. Matsuoka, N. Matsumoto, S. Bian, M. Waga, and K. Suenaga, “Oblivious online monitoring for safety LTL specification via fully homomorphic encryption,” in 34th International Conference on Computer Aided Verification (CAV), vol. 13371, 2022, pp. 447–468.
- M. V. Beirendonck, J. D’Anvers, and I. Verbauwhede, “FPT: a fixed-point accelerator for torus fully homomorphic encryption,” IACR Cryptol. ePrint Arch., p. 1635, 2022.
- S. Bian, Z. Zhang, H. Pan, R. Mao, Z. Zhao, YierJin, and ZhenyuGuan., “He33{{}^{3}}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTdb: An efficient and elastic encrypted database via arithmeticand-logic fully homomorphic encryption,” in Proceedings of the 2023 ACMSIGSAC Conference on Computer and Communications Security (CCS), 2023.
- A. Brutzkus, O. Elisha, and R. Gilad-Bachrach, “Low latency privacy preserving inference,” in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
- X. Cao, C. Moore, M. O’Neill, E. O’Sullivan, and N. Hanley, “Optimised multiplication architectures for accelerating fully homomorphic encryption,” IEEE Transactions on Computers, vol. 65, 2016.
- A. Chatterjee and I. Sengupta, “Furisc: Fhe encrypted urisc design,” Cryptology ePrint Archive, Paper 2015/699, 2015. [Online]. Available: https://eprint.iacr.org/2015/699
- C. Chen, C. Shen, and J. Zhang, “Lightweight and secure branch predictors against spectre attacks,” in 27th Asia and South Pacific Design Automation Conference, (ASPDAC), 2022, pp. 25–30.
- J. H. Cheon, K. Han, and M. Hhan, “Faster homomorphic discrete fourier transforms and improved FHE bootstrapping,” IACR Cryptol. ePrint Arch., 2018.
- J. H. Cheon, A. Kim, M. Kim, and Y. S. Song, “Homomorphic encryption for arithmetic of approximate numbers,” in 23rd International Conference on the Theory and Applications of Cryptology and Information Security, vol. 10624, 2017, pp. 409–437.
- J. H. Cheon, M. Kim, and M. Kim, “Optimized search-and-compute circuits and their application to query evaluation on encrypted data,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 1, pp. 188–199, 2016.
- I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène, “Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds,” in 22nd International Conference on the Theory and Application of Cryptology and Information Security, vol. 10031, 2016, pp. 3–33.
- W. Dai and B. Sunar, “cuhe: A homomorphic encryption accelerator library,” in Cryptography and Information Security in the Balkans - Second International Conference, vol. 9540, 2015, pp. 169–186.
- R. Dathathri, O. Saarikivi, H. Chen, K. Laine, K. E. Lauter, S. Maleki, M. Musuvathi, and T. Mytkowicz, “CHET: an optimizing compiler for fully-homomorphic neural-network inferencing,” in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, (PLDI), 2019, pp. 142–156.
- X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, “Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994–1007, 2012.
- L. Ducas and D. Micciancio, “FHEW: bootstrapping homomorphic encryption in less than a second,” in Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, 2015.
- J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,” IACR Cryptol. ePrint Arch., p. 144, 2012. [Online]. Available: http://eprint.iacr.org/2012/144
- S. Fan, Z. Wang, W. Xu, R. Hou, D. Meng, and M. Zhang, “Tensorfhe: Achieving practical computation on encrypted data using gpgpu,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 922–934.
- D. Froelicher, J. R. Troncoso-Pastoriza, J. L. Raisaro, M. A. Cuendet, and J. P. Hubaux, “Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption,” Nature Communications, vol. 12, no. 1, 2021.
- S. Gupta, R. Cammarota, and T. Rosing, “Memfhe: End-to-end computing with fully homomorphic encryption in memory,” 2022.
- S. Gupta and T. S. Rosing, “Invited: Accelerating fully homomorphic encryption with processing in memory,” in ACM/IEEE Design Automation Conference (DAC), 2021, pp. 1335–1338.
- K. Han, S. Hong, J. H. Cheon, and D. Park, “Logistic regression on homomorphic encrypted data at scale,” in The 33rd AAAI Conference on Artificial Intelligence, (AAAI), 2019, pp. 9466–9471.
- M. Han, Y. Zhu, Q. Lou, Z. Zhou, S. Guo, and L. Ju, “Coxhe: A software-hardware co-design framework for fpga acceleration of homomorphic computation,” in Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe, (DATE), 2022.
- HElib, “Cuda-accelerated fully homomorphic encryption library,” 2019. [Online]. Available: https://github.com/homenc/HElib/tree/master/examples/BGV_country_db_lookuparchivedathttps://perma.cc/U2MW-QLRJ.
- Intel. (2018) Intel® 64 and IA-32 Architectures Optimization Reference Manual. [Online]. Available: https://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf
- F. Jafarzadehpour, A. S. Molahosseini, A. A. E. Zarandi, and L. Sousa, “Efficient modular adder designs based on thermometer and one-hot coding,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 9, pp. 2142–2155, 2019.
- L. Jiang, Q. Lou, and N. Joshi, “MATCHA: a fast and energy-efficient accelerator for fully homomorphic encryption over the torus,” in ACM/IEEE Design Automation Conference (DAC), 2022, pp. 235–240.
- W. Jung, S. Kim, J. H. Ahn, J. H. Cheon, and Y. Lee, “Over 100x faster bootstrapping in fully homomorphic encryption through memory-centric optimization with gpus,” IACR Transactions on Cryptographic Hardware and Embedded Systems, no. 4, p. 114–148, 2021.
- A. Khedr and G. Gulak, “Homomorphic processing unit (hpu) for accelerating secure computations under homomorphic encryption,” 2019.
- H. Kim, J. Mu, C. Yu, T. T.-H. Kim, and B. Kim, “A 1-16b reconfigurable 80kb 7t sram-based digital near-memory computing macro for processing neural networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 4, pp. 1580–1590, 2023.
- J. Kim, S. Kim, J. Choi, J. Park, D. Kim, and J. H. Ahn, “Sharp: A short-word hierarchical accelerator for robust and practical fully homomorphic encryption,” in Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023.
- J. Kim, G. Lee, S. Kim, G. Sohn, M. Rhu, J. Kim, and J. H. Ahn, “ARK: fully homomorphic encryption accelerator with runtime data generation and inter-operation key reuse,” in 55th IEEE/ACM International Symposium on Microarchitecture (Micro), 2022, pp. 1237–1254.
- S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “BTS: an accelerator for bootstrappable fully homomorphic encryption,” in The 49th Annual International Symposium on Computer Architecture (ISCA), 2022, pp. 711–725.
- Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A fast and extensible dram simulator,” IEEE Computer Architecture Letters, vol. 15, no. 1, pp. 45–49, 2016.
- H. Ku, W. Susilo, Y. Zhang, W. Liu, and M. Zhang, “Privacy-preserving federated learning in medical diagnosis with homomorphic re-encryption,” Computer Standards & Interfaces, vol. 80, 2022.
- C. Lee, S. Min, J. Seo, and Y. Song, “Faster TFHE bootstrapping with block binary keys,” in Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security (ASIACCS), 2023, pp. 2–13.
- E. Lee, J. Lee, J. Lee, Y. Kim, Y. Kim, J. No, and W. Choi, “Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions,” in International Conference on Machine Learning (ICML), 2022, pp. 12 403–12 422.
- J.-W. Lee, H. Kang, Y. Lee, W. Choi, J. Eom, M. Deryabin, E. Lee, J. Lee, D. Yoo, Y.-S. Kim, and J.-S. No, “Privacy-preserving machine learning with fully homomorphic encryption for deep neural network,” IEEE Access, vol. 10, pp. 30 039–30 054, 2022.
- J. Lin, L. Liang, Z. Qu, I. Ahmad, L. Liu, F. Tu, T. Gupta, Y. Ding, and Y. Xie, “Inspire: In-storage private information retrieval via protocol and architecture co-design,” in Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA), 2022, p. 102–115.
- M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, “Meltdown: Reading Kernel Memory from User Space,” in USENIX Security Symposium, 2018, pp. 973–990.
- V. Lyubashevsky, C. Peikert, and O. Regev, “On ideal lattices and learning with errors over rings,” in 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, 2010.
- C. Marcolla, V. Sucasas, M. Manzano, R. Bassoli, F. H. P. Fitzek, and N. Aaraj, “Survey on fully homomorphic encryption, theory, and applications,” Proceedings of the IEEE, vol. 110, pp. 1572–1609, 2022.
- K. Matsuoka, R. Banno, N. Matsumoto, T. Sato, and S. Bian, “Virtual secure platform: A five-stage pipeline processor over TFHE,” in USENIX Security Symposium, 2021, pp. 4007–4024.
- A. S. Molahosseini, A. Asadpoor, A. A. E. Zarandi, and L. Sousa, “Towards efficient modular adders based on reversible circuits,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2018.
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi, “Optimizing nuca organizations and wiring alternatives for large caches with cacti 6.0,” in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007, pp. 3–14.
- K. Nam, H. Oh, H. Moon, and Y. Paek, “Accelerating n-bit operations over tfhe on commodity cpu-fpga,” in Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2022.
- nucyper, “Nufhe, a gpu-powered torus fhe implementation,” https://github.com/nucypher/nufhe, 2019.
- R. Paludo and L. Sousa, “Ntt architecture for a linux-ready risc-v fully-homomorphic encryption accelerator,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 7, pp. 2669–2682, 2022.
- Prasetiyo, A. Putra, and J.-Y. Kim, “Morphling: A throughput-maximized tfhe-based accelerator using transform-domain reuse,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024.
- A. Putra, Prasetiyo, Y. Chen, J. Kim, and J.-Y. Kim, “Strix: An end-to-end streaming architecture with two-level ciphertext batching for fully homomorphic encryption with programmable bootstrapping,” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023.
- B. Reagen, W. Choi, Y. Ko, V. T. Lee, H. S. Lee, G. Wei, and D. Brooks, “Cheetah: Optimizing and accelerating homomorphic encryption for private inference,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 26–39.
- D. Reis, J. Takeshita, T. Jung, M. T. Niemier, and X. S. Hu, “Computing-in-memory for performance and energy-efficient homomorphic encryption,” IEEE Trans. Very Large Scale Integr. Syst., vol. 28, no. 11, pp. 2300–2313, 2020.
- M. S. Riazi, K. Laine, B. Pelton, and W. Dai, “HEAX: an architecture for computing on encrypted data,” in Architectural Support for Programming Languages and Operating Systems, Lausanne, (ASPLOS), 2020, pp. 1295–1309.
- S. Rixner, W. J. Dally, B. Khailany, P. R. Mattson, U. J. Kapasi, and J. D. Owens, “Register organization for media processing,” in Proceedings of the Sixth International Symposium on High-Performance Computer Architecture (HPCA), 2000, pp. 375–386.
- L. Rovida, “Fast but approximate homomorphic k-means based on masking technique,” Int. J. Inf. Sec., 2023.
- N. Samardzic, A. Feldmann, A. Krastev, S. Devadas, R. G. Dreslinski, C. Peikert, and D. Sánchez, “F1: A fast and programmable accelerator for fully homomorphic encryption,” in 54th Annual IEEE/ACM International Symposium on Microarchitecture (Micro), 2021, pp. 238–252.
- N. Samardzic, A. Feldmann, A. Krastev, N. Manohar, N. Genise, S. Devadas, K. Eldefrawy, C. Peikert, and D. Sánchez, “Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data,” in The 49th Annual International Symposium on Computer Architecture (ISCA), 2022, pp. 173–187.
- C. Shen, C. Chen, and J. Zhang, “Micro-architectural cache side-channel attacks and countermeasures,” in 26th Asia and South Pacific Design Automation Conference, (ASPDAC), 2021, pp. 441–448.
- D. Soni, M. Nabeel, H. Gamil, O. Mazonka, B. Reagen, R. Karri, and M. Maniatakos, “Design space exploration of modular multipliers for asic fhe accelerators,” in 24th International Symposium on Quality Electronic Design (ISQED), 2023, pp. 1–8.
- J. Takeshita, D. Reis, T. Gong, M. Niemier, X. S. Hu, and T. Jung, “Accelerating finite-field and torus fhe via compute-enabled (s)ram,” IEEE Transactions on Computers, pp. 1–14, 2023.
- S. Tan, B. Knott, Y. Tian, and D. J. Wu, “Cryptgpu: Fast privacy-preserving machine learning on the GPU,” in IEEE Symposium on Security and Privacy (SP), 2021.
- W. Tan, A. Wang, X. Zhang, Y. Lao, and K. K. Parhi, “High-speed VLSI architectures for modular polynomial multiplication via fast filtering and applications to lattice-based cryptography,” IEEE Transactions on Computers, vol. 72, no. 9, pp. 2454–2466, 2023.
- vernamlab, “Cuda-accelerated fully homomorphic encryption library,” https://github.com/vernamlab/cuFHE.
- J. Wang, C. Xiong, K. Zhang, and J. Wei, “Fixed-point analysis and parameter optimization of the radix-2kk{}^{\mbox{k}}start_FLOATSUPERSCRIPT k end_FLOATSUPERSCRIPT pipelined FFT processor,” IEEE Transactions on Signal Processing, vol. 63, no. 18, pp. 4879–4893, 2015.
- W. Wang, Z. Chen, and X. Huang, “Accelerating leveled fully homomorphic encryption using GPU,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2014, pp. 2800–2803.
- W. Wang, X. Huang, N. Emmart, and C. Weems, “Vlsi design of a large-number multiplier for fully homomorphic encryption,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 9, pp. 1879–1887, 2014.
- X. Wang, J. Yang, Y. Zhao, X. Jia, R. Yin, X. Chen, G. Qu, and W. Zhao, “Triangle counting accelerations: From algorithm to in-memory computing architecture,” IEEE Transactions on Computers, vol. 71, no. 10, pp. 2462–2472, 2022.
- Z. Wang, P. Li, R. Hou, Z. Li, J. Cao, X. Wang, and D. Meng, “He-booster: An efficient polynomial arithmetic acceleration on gpus for fully homomorphic encryption,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 4, pp. 1067–1081, 2023.
- S.-Y. Wu, K.-Y. Chen, and M.-D. Shieh, “Efficient vlsi architecture of bluestein’s fft for fully homomorphic encryption,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2022, pp. 2242–2245.
- X. Xie, P. Gu, Y. Ding, D. Niu, H. Zheng, and Y. Xie, “Mpu: Memory-centric simt processor via in-dram near-bank computing,” ACM Transactions on Architecture and Code Optimization, vol. 20, 2023.
- Y. Yang, H. Lu, and X. Li, “Poseidon-ndp: Practical fully homomorphic encryption accelerator based on near data processing architecture,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 12, pp. 4749–4762, 2023.
- Y. Yang, H. Zhang, S. Fan, H. Lu, M. Zhang, and X. Li, “Poseidon: Practical homomorphic encryption accelerator,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023.
- P. Zhang, T. Huang, X. Sun, W. Zhao, H. Liu, S. Lai, and J. K. Liu, “Privacy-preserving and outsourced multi-party k-means clustering based on multi-key fully homomorphic encryption,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 3, 2023.
- Y. Zhang, S. Wang, X. Zhang, J. Dong, X. Mao, F. Long, C. Wang, D. Zhou, M. Gao, and G. Sun, “Pipezk: Accelerating zero-knowledge proof with a pipelined architecture,” in 48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 416–428.
- Lin Ding (6 papers)
- Song Bian (21 papers)
- Penggao He (1 paper)
- Yan Xu (258 papers)
- Gang Qu (40 papers)
- Jiliang Zhang (45 papers)