Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption (2404.15819v2)

Published 24 Apr 2024 in cs.AR

Abstract: Fully Homomorphic Encryption (FHE) is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present \NAME, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of computational resources and memory bandwidth. The experimental results illustrate that APACHE outperforms state-of-the-art ASIC FHE accelerators by 10.63x to 35.47x over a variety of application benchmarks, e.g., Lola MNIST, HELR, VSP, and HE${3}$DB.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. “Lattigo v5,” Online: https://github.com/tuneinsight/lattigo, Nov. 2023, ePFL-LDS, Tune Insight SA.
  2. R. Agrawal, L. de Castro, G. Yang, C. Juvekar, R. T. Yazicigil, A. P. Chandrakasan, V. Vaikuntanathan, and A. Joshi, “FAB: an fpga-based accelerator for bootstrappable fully homomorphic encryption,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 882–895.
  3. S. Akleylek, Ö. Dagdelen, and Z. Y. Tok, “On the efficiency of polynomial multiplication for lattice-based cryptography on gpus using CUDA,” in Cryptography and Information Security in the Balkans - Second International Conference, vol. 9540, 2015, pp. 155–168.
  4. A. A. Badawi, B. Veeravalli, C. F. Mun, and K. M. M. Aung, “High-performance FV somewhat homomorphic encryption on gpus: An implementation using CUDA,” IACR Transactions on Cryptographic Hardware and Embedded Systems, no. 2, pp. 70–95, 2018.
  5. R. Banno, K. Matsuoka, N. Matsumoto, S. Bian, M. Waga, and K. Suenaga, “Oblivious online monitoring for safety LTL specification via fully homomorphic encryption,” in 34th International Conference on Computer Aided Verification (CAV), vol. 13371, 2022, pp. 447–468.
  6. M. V. Beirendonck, J. D’Anvers, and I. Verbauwhede, “FPT: a fixed-point accelerator for torus fully homomorphic encryption,” IACR Cryptol. ePrint Arch., p. 1635, 2022.
  7. S. Bian, Z. Zhang, H. Pan, R. Mao, Z. Zhao, YierJin, and ZhenyuGuan., “He33{{}^{3}}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTdb: An efficient and elastic encrypted database via arithmeticand-logic fully homomorphic encryption,” in Proceedings of the 2023 ACMSIGSAC Conference on Computer and Communications Security (CCS), 2023.
  8. A. Brutzkus, O. Elisha, and R. Gilad-Bachrach, “Low latency privacy preserving inference,” in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
  9. X. Cao, C. Moore, M. O’Neill, E. O’Sullivan, and N. Hanley, “Optimised multiplication architectures for accelerating fully homomorphic encryption,” IEEE Transactions on Computers, vol. 65, 2016.
  10. A. Chatterjee and I. Sengupta, “Furisc: Fhe encrypted urisc design,” Cryptology ePrint Archive, Paper 2015/699, 2015. [Online]. Available: https://eprint.iacr.org/2015/699
  11. C. Chen, C. Shen, and J. Zhang, “Lightweight and secure branch predictors against spectre attacks,” in 27th Asia and South Pacific Design Automation Conference, (ASPDAC), 2022, pp. 25–30.
  12. J. H. Cheon, K. Han, and M. Hhan, “Faster homomorphic discrete fourier transforms and improved FHE bootstrapping,” IACR Cryptol. ePrint Arch., 2018.
  13. J. H. Cheon, A. Kim, M. Kim, and Y. S. Song, “Homomorphic encryption for arithmetic of approximate numbers,” in 23rd International Conference on the Theory and Applications of Cryptology and Information Security, vol. 10624, 2017, pp. 409–437.
  14. J. H. Cheon, M. Kim, and M. Kim, “Optimized search-and-compute circuits and their application to query evaluation on encrypted data,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 1, pp. 188–199, 2016.
  15. I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène, “Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds,” in 22nd International Conference on the Theory and Application of Cryptology and Information Security, vol. 10031, 2016, pp. 3–33.
  16. W. Dai and B. Sunar, “cuhe: A homomorphic encryption accelerator library,” in Cryptography and Information Security in the Balkans - Second International Conference, vol. 9540, 2015, pp. 169–186.
  17. R. Dathathri, O. Saarikivi, H. Chen, K. Laine, K. E. Lauter, S. Maleki, M. Musuvathi, and T. Mytkowicz, “CHET: an optimizing compiler for fully-homomorphic neural-network inferencing,” in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, (PLDI), 2019, pp. 142–156.
  18. X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, “Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994–1007, 2012.
  19. L. Ducas and D. Micciancio, “FHEW: bootstrapping homomorphic encryption in less than a second,” in Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, 2015.
  20. J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,” IACR Cryptol. ePrint Arch., p. 144, 2012. [Online]. Available: http://eprint.iacr.org/2012/144
  21. S. Fan, Z. Wang, W. Xu, R. Hou, D. Meng, and M. Zhang, “Tensorfhe: Achieving practical computation on encrypted data using gpgpu,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 922–934.
  22. D. Froelicher, J. R. Troncoso-Pastoriza, J. L. Raisaro, M. A. Cuendet, and J. P. Hubaux, “Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption,” Nature Communications, vol. 12, no. 1, 2021.
  23. S. Gupta, R. Cammarota, and T. Rosing, “Memfhe: End-to-end computing with fully homomorphic encryption in memory,” 2022.
  24. S. Gupta and T. S. Rosing, “Invited: Accelerating fully homomorphic encryption with processing in memory,” in ACM/IEEE Design Automation Conference (DAC), 2021, pp. 1335–1338.
  25. K. Han, S. Hong, J. H. Cheon, and D. Park, “Logistic regression on homomorphic encrypted data at scale,” in The 33rd AAAI Conference on Artificial Intelligence, (AAAI), 2019, pp. 9466–9471.
  26. M. Han, Y. Zhu, Q. Lou, Z. Zhou, S. Guo, and L. Ju, “Coxhe: A software-hardware co-design framework for fpga acceleration of homomorphic computation,” in Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe, (DATE), 2022.
  27. HElib, “Cuda-accelerated fully homomorphic encryption library,” 2019. [Online]. Available: https://github.com/homenc/HElib/tree/master/examples/BGV_country_db_lookuparchivedathttps://perma.cc/U2MW-QLRJ.
  28. Intel. (2018) Intel® 64 and IA-32 Architectures Optimization Reference Manual. [Online]. Available: https://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf
  29. F. Jafarzadehpour, A. S. Molahosseini, A. A. E. Zarandi, and L. Sousa, “Efficient modular adder designs based on thermometer and one-hot coding,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 9, pp. 2142–2155, 2019.
  30. L. Jiang, Q. Lou, and N. Joshi, “MATCHA: a fast and energy-efficient accelerator for fully homomorphic encryption over the torus,” in ACM/IEEE Design Automation Conference (DAC), 2022, pp. 235–240.
  31. W. Jung, S. Kim, J. H. Ahn, J. H. Cheon, and Y. Lee, “Over 100x faster bootstrapping in fully homomorphic encryption through memory-centric optimization with gpus,” IACR Transactions on Cryptographic Hardware and Embedded Systems, no. 4, p. 114–148, 2021.
  32. A. Khedr and G. Gulak, “Homomorphic processing unit (hpu) for accelerating secure computations under homomorphic encryption,” 2019.
  33. H. Kim, J. Mu, C. Yu, T. T.-H. Kim, and B. Kim, “A 1-16b reconfigurable 80kb 7t sram-based digital near-memory computing macro for processing neural networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 4, pp. 1580–1590, 2023.
  34. J. Kim, S. Kim, J. Choi, J. Park, D. Kim, and J. H. Ahn, “Sharp: A short-word hierarchical accelerator for robust and practical fully homomorphic encryption,” in Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023.
  35. J. Kim, G. Lee, S. Kim, G. Sohn, M. Rhu, J. Kim, and J. H. Ahn, “ARK: fully homomorphic encryption accelerator with runtime data generation and inter-operation key reuse,” in 55th IEEE/ACM International Symposium on Microarchitecture (Micro), 2022, pp. 1237–1254.
  36. S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “BTS: an accelerator for bootstrappable fully homomorphic encryption,” in The 49th Annual International Symposium on Computer Architecture (ISCA), 2022, pp. 711–725.
  37. Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A fast and extensible dram simulator,” IEEE Computer Architecture Letters, vol. 15, no. 1, pp. 45–49, 2016.
  38. H. Ku, W. Susilo, Y. Zhang, W. Liu, and M. Zhang, “Privacy-preserving federated learning in medical diagnosis with homomorphic re-encryption,” Computer Standards & Interfaces, vol. 80, 2022.
  39. C. Lee, S. Min, J. Seo, and Y. Song, “Faster TFHE bootstrapping with block binary keys,” in Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security (ASIACCS), 2023, pp. 2–13.
  40. E. Lee, J. Lee, J. Lee, Y. Kim, Y. Kim, J. No, and W. Choi, “Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions,” in International Conference on Machine Learning (ICML), 2022, pp. 12 403–12 422.
  41. J.-W. Lee, H. Kang, Y. Lee, W. Choi, J. Eom, M. Deryabin, E. Lee, J. Lee, D. Yoo, Y.-S. Kim, and J.-S. No, “Privacy-preserving machine learning with fully homomorphic encryption for deep neural network,” IEEE Access, vol. 10, pp. 30 039–30 054, 2022.
  42. J. Lin, L. Liang, Z. Qu, I. Ahmad, L. Liu, F. Tu, T. Gupta, Y. Ding, and Y. Xie, “Inspire: In-storage private information retrieval via protocol and architecture co-design,” in Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA), 2022, p. 102–115.
  43. M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, “Meltdown: Reading Kernel Memory from User Space,” in USENIX Security Symposium, 2018, pp. 973–990.
  44. V. Lyubashevsky, C. Peikert, and O. Regev, “On ideal lattices and learning with errors over rings,” in 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, 2010.
  45. C. Marcolla, V. Sucasas, M. Manzano, R. Bassoli, F. H. P. Fitzek, and N. Aaraj, “Survey on fully homomorphic encryption, theory, and applications,” Proceedings of the IEEE, vol. 110, pp. 1572–1609, 2022.
  46. K. Matsuoka, R. Banno, N. Matsumoto, T. Sato, and S. Bian, “Virtual secure platform: A five-stage pipeline processor over TFHE,” in USENIX Security Symposium, 2021, pp. 4007–4024.
  47. A. S. Molahosseini, A. Asadpoor, A. A. E. Zarandi, and L. Sousa, “Towards efficient modular adders based on reversible circuits,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2018.
  48. N. Muralimanohar, R. Balasubramonian, and N. Jouppi, “Optimizing nuca organizations and wiring alternatives for large caches with cacti 6.0,” in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007, pp. 3–14.
  49. K. Nam, H. Oh, H. Moon, and Y. Paek, “Accelerating n-bit operations over tfhe on commodity cpu-fpga,” in Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2022.
  50. nucyper, “Nufhe, a gpu-powered torus fhe implementation,” https://github.com/nucypher/nufhe, 2019.
  51. R. Paludo and L. Sousa, “Ntt architecture for a linux-ready risc-v fully-homomorphic encryption accelerator,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 7, pp. 2669–2682, 2022.
  52. Prasetiyo, A. Putra, and J.-Y. Kim, “Morphling: A throughput-maximized tfhe-based accelerator using transform-domain reuse,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024.
  53. A. Putra, Prasetiyo, Y. Chen, J. Kim, and J.-Y. Kim, “Strix: An end-to-end streaming architecture with two-level ciphertext batching for fully homomorphic encryption with programmable bootstrapping,” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023.
  54. B. Reagen, W. Choi, Y. Ko, V. T. Lee, H. S. Lee, G. Wei, and D. Brooks, “Cheetah: Optimizing and accelerating homomorphic encryption for private inference,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 26–39.
  55. D. Reis, J. Takeshita, T. Jung, M. T. Niemier, and X. S. Hu, “Computing-in-memory for performance and energy-efficient homomorphic encryption,” IEEE Trans. Very Large Scale Integr. Syst., vol. 28, no. 11, pp. 2300–2313, 2020.
  56. M. S. Riazi, K. Laine, B. Pelton, and W. Dai, “HEAX: an architecture for computing on encrypted data,” in Architectural Support for Programming Languages and Operating Systems, Lausanne, (ASPLOS), 2020, pp. 1295–1309.
  57. S. Rixner, W. J. Dally, B. Khailany, P. R. Mattson, U. J. Kapasi, and J. D. Owens, “Register organization for media processing,” in Proceedings of the Sixth International Symposium on High-Performance Computer Architecture (HPCA), 2000, pp. 375–386.
  58. L. Rovida, “Fast but approximate homomorphic k-means based on masking technique,” Int. J. Inf. Sec., 2023.
  59. N. Samardzic, A. Feldmann, A. Krastev, S. Devadas, R. G. Dreslinski, C. Peikert, and D. Sánchez, “F1: A fast and programmable accelerator for fully homomorphic encryption,” in 54th Annual IEEE/ACM International Symposium on Microarchitecture (Micro), 2021, pp. 238–252.
  60. N. Samardzic, A. Feldmann, A. Krastev, N. Manohar, N. Genise, S. Devadas, K. Eldefrawy, C. Peikert, and D. Sánchez, “Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data,” in The 49th Annual International Symposium on Computer Architecture (ISCA), 2022, pp. 173–187.
  61. C. Shen, C. Chen, and J. Zhang, “Micro-architectural cache side-channel attacks and countermeasures,” in 26th Asia and South Pacific Design Automation Conference, (ASPDAC), 2021, pp. 441–448.
  62. D. Soni, M. Nabeel, H. Gamil, O. Mazonka, B. Reagen, R. Karri, and M. Maniatakos, “Design space exploration of modular multipliers for asic fhe accelerators,” in 24th International Symposium on Quality Electronic Design (ISQED), 2023, pp. 1–8.
  63. J. Takeshita, D. Reis, T. Gong, M. Niemier, X. S. Hu, and T. Jung, “Accelerating finite-field and torus fhe via compute-enabled (s)ram,” IEEE Transactions on Computers, pp. 1–14, 2023.
  64. S. Tan, B. Knott, Y. Tian, and D. J. Wu, “Cryptgpu: Fast privacy-preserving machine learning on the GPU,” in IEEE Symposium on Security and Privacy (SP), 2021.
  65. W. Tan, A. Wang, X. Zhang, Y. Lao, and K. K. Parhi, “High-speed VLSI architectures for modular polynomial multiplication via fast filtering and applications to lattice-based cryptography,” IEEE Transactions on Computers, vol. 72, no. 9, pp. 2454–2466, 2023.
  66. vernamlab, “Cuda-accelerated fully homomorphic encryption library,” https://github.com/vernamlab/cuFHE.
  67. J. Wang, C. Xiong, K. Zhang, and J. Wei, “Fixed-point analysis and parameter optimization of the radix-2kk{}^{\mbox{k}}start_FLOATSUPERSCRIPT k end_FLOATSUPERSCRIPT pipelined FFT processor,” IEEE Transactions on Signal Processing, vol. 63, no. 18, pp. 4879–4893, 2015.
  68. W. Wang, Z. Chen, and X. Huang, “Accelerating leveled fully homomorphic encryption using GPU,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2014, pp. 2800–2803.
  69. W. Wang, X. Huang, N. Emmart, and C. Weems, “Vlsi design of a large-number multiplier for fully homomorphic encryption,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 9, pp. 1879–1887, 2014.
  70. X. Wang, J. Yang, Y. Zhao, X. Jia, R. Yin, X. Chen, G. Qu, and W. Zhao, “Triangle counting accelerations: From algorithm to in-memory computing architecture,” IEEE Transactions on Computers, vol. 71, no. 10, pp. 2462–2472, 2022.
  71. Z. Wang, P. Li, R. Hou, Z. Li, J. Cao, X. Wang, and D. Meng, “He-booster: An efficient polynomial arithmetic acceleration on gpus for fully homomorphic encryption,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 4, pp. 1067–1081, 2023.
  72. S.-Y. Wu, K.-Y. Chen, and M.-D. Shieh, “Efficient vlsi architecture of bluestein’s fft for fully homomorphic encryption,” in IEEE International Symposium on Circuits and Systems (ISCAS), 2022, pp. 2242–2245.
  73. X. Xie, P. Gu, Y. Ding, D. Niu, H. Zheng, and Y. Xie, “Mpu: Memory-centric simt processor via in-dram near-bank computing,” ACM Transactions on Architecture and Code Optimization, vol. 20, 2023.
  74. Y. Yang, H. Lu, and X. Li, “Poseidon-ndp: Practical fully homomorphic encryption accelerator based on near data processing architecture,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 12, pp. 4749–4762, 2023.
  75. Y. Yang, H. Zhang, S. Fan, H. Lu, M. Zhang, and X. Li, “Poseidon: Practical homomorphic encryption accelerator,” in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023.
  76. P. Zhang, T. Huang, X. Sun, W. Zhao, H. Liu, S. Lai, and J. K. Liu, “Privacy-preserving and outsourced multi-party k-means clustering based on multi-key fully homomorphic encryption,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 3, 2023.
  77. Y. Zhang, S. Wang, X. Zhang, J. Dong, X. Mao, F. Long, C. Wang, D. Zhou, M. Gao, and G. Sun, “Pipezk: Accelerating zero-knowledge proof with a pipelined architecture,” in 48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 416–428.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lin Ding (6 papers)
  2. Song Bian (21 papers)
  3. Penggao He (1 paper)
  4. Yan Xu (258 papers)
  5. Gang Qu (40 papers)
  6. Jiliang Zhang (45 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com