HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference (2401.15970v2)
Abstract: Secure two-party computation with homomorphic encryption (HE) protects data privacy with a formal security guarantee but suffers from high communication overhead. While previous works, e.g., Cheetah, Iron, etc, have proposed efficient HE-based protocols for different neural network (NN) operations, they still assume high precision, e.g., fixed point 37 bit, for the NN operations and ignore NNs' native robustness against quantization error. In this paper, we propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. We observe the benefit of a naive combination of quantization and HE quickly saturates as bit precision goes down. Hence, to further improve communication efficiency, we propose a series of optimizations, including an intra-coefficient packing algorithm and a quantization-aware tiling algorithm, to simultaneously reduce the number and precision of the transferred data. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5\sim 23.4\times$ communication reduction and $3.0\sim 9.3\times$ latency reduction. Meanwhile, when compared with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant also achieves $3.1\sim 3.6\times$ communication reduction.
- Efficientmask-net for face authentication in the era of covid-19 pandemic. Signal, Image and Video Processing, 16(7):1991–1999, 2022.
- Lsq+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 696–697, 2020.
- Siri, alexa, and other digital assistants: a study of customer satisfaction with artificial intelligence applications. In The Role of Smart Technologies in Decision Making, pages 35–70. Routledge, 2022.
- Ezpc: Programmable, efficient, and scalable secure two-party computation for machine learning. Cryptology ePrint Archive, 2017.
- Sphynx: A deep neural network design for private inference. IEEE Security & Privacy, 20(5):22–34, 2022a.
- Selective network linearization for efficient private inference. In International Conference on Machine Learning, pages 3947–3961. PMLR, 2022b.
- Impala: Low-latency, communication-efficient private deep learning inference, 2022.
- Chet: an optimizing compiler for fully-homomorphic neural-network inferencing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019.
- Aby -a framework for efficient mixed-protocol secure two-party computation. In Proceedings 2015 Network and Distributed System Security Symposium, 2015.
- Hawq-v2: Hessian aware trace-weighted quantization of neural networks. Advances in neural information processing systems, 33:18518–18529, 2020.
- Tensorfhe: Achieving practical computation on encrypted data using gpgpu. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 922–934. IEEE, 2023.
- Characterizing and optimizing end-to-end systems for private inference, 2022.
- A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, pages 291–326. Chapman and Hall/CRC, 2022.
- Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International conference on machine learning, pages 201–210. PMLR, 2016.
- Llama: A low latency math library for secure inference. Proceedings on Privacy Enhancing Technologies, 2022(4):274–294, 2022.
- Iron: Private inference on transformers. In Advances in Neural Information Processing Systems, 2022.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Cheetah: Lean and fast secure {{\{{Two-Party}}\}} deep neural network inference. In 31st USENIX Security Symposium (USENIX Security 22), pages 809–826, 2022.
- Coinn: Crypto/ml codesign for oblivious inference via neural networks. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 3266–3281, 2021.
- Deepreduce: Relu reduction for fast private inference. In International Conference on Machine Learning, pages 4839–4849. PMLR, 2021.
- GAZELLE: A low latency framework for secure neural network inference, 2018.
- End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence, 3(6):473–484, 2021.
- Optimized privacy-preserving cnn inference with fully homomorphic encryption. IEEE Transactions on Information Forensics and Security, 18:2175–2187, 2023.
- Hyphen: A hybrid packing method and optimizations for homomorphic encryption-based neural networks, 2023a.
- Hyphen: A hybrid packing method and optimizations for homomorphic encryption-based neural networks. arXiv preprint arXiv:2302.02407, 2023b.
- Ark: Fully homomorphic encryption accelerator with runtime data generation and inter-operation key reuse. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1237–1254. IEEE, 2022.
- Adam: A method for stochastic optimization. arXiv: Learning,arXiv: Learning, 2014.
- Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342, 2018.
- Cryptflow: Secure tensorflow inference. In 2020 IEEE Symposium on Security and Privacy (SP), pages 336–353. IEEE, 2020.
- Learning to linearize deep neural networks for secure and efficient private inference. arXiv preprint arXiv:2301.09254, 2023.
- Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions. In International Conference on Machine Learning, pages 12403–12422. PMLR, 2022a.
- Privacy-preserving machine learning with fully homomorphic encryption for deep neural network. IEEE Access, 10:30039–30054, 2022b.
- Muse: Secure inference resilient to malicious clients. In 30th USENIX Security Symposium (USENIX Security 21), pages 2201–2218, 2021.
- Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.
- Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3296–3305, 2019.
- Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4942–4952, 2022.
- Bumblebee: Secure two-party inference framework for large transformers. Cryptology ePrint Archive, 2023.
- Delphi: A cryptographic inference service for neural networks, 2020.
- Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE symposium on security and privacy (SP), pages 19–38. IEEE, 2017.
- A white paper on neural network quantization. arXiv preprint arXiv:2106.08295, 2021.
- Fully homomorphically encrypted deep learning as a service. Machine Learning and Knowledge Extraction, 3(4):819–834, 2021.
- Toward practical privacy-preserving convolutional neural networks exploiting fully homomorphic encryption. arXiv preprint arXiv:2310.16530, 2023.
- Binary neural networks: A survey. Pattern Recognition, 105:107281, 2020a.
- Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2250–2259, 2020b.
- Cryptflow2: Practical 2-party secure inference. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 325–342, 2020.
- Sirnn: A math library for secure rnn inference. In 2021 IEEE Symposium on Security and Privacy (SP), pages 1003–1020. IEEE, 2021.
- SEAL. Microsoft SEAL (release 3.6). https://github.com/Microsoft/SEAL, 2020. Microsoft Research, Redmond, WA.
- Abnn 2. In Proceedings of the 59th ACM/IEEE Design Automation Conference, 2022.
- EMP-toolkit: Efficient MultiParty computation toolkit. https://github.com/emp-toolkit, 2016.
- Falcon: Accelerating homomorphically encrypted convolutions for efficient private mobile network inference. arXiv preprint arXiv:2308.13189, 2023.
- Kohei Yamamoto. Learnable companding quantization for accurate low-bit neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5029–5038, 2021.
- Hawq-v3: Dyadic neural network quantization. In International Conference on Machine Learning, pages 11875–11886. PMLR, 2021.
- A comprehensive review of binary neural network. Artificial Intelligence Review, pages 1–65, 2023.
- Copriv: Network/protocol co-optimization for communication-efficient private inference. arXiv preprint arXiv:2311.01737, 2023.