SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC (2401.00793v5)
Abstract: With the growing use of Transformer models hosted on cloud platforms to offer inference services, privacy concerns are escalating, especially concerning sensitive data like investment plans and bank account details. Secure Multi-Party Computing (SMPC) emerges as a promising solution to protect the privacy of inference data and model parameters. However, the application of SMPC in Privacy-Preserving Inference (PPI) for Transformer models often leads to considerable slowdowns or declines in performance. This is largely due to the multitude of nonlinear operations in the Transformer architecture, which are not well-suited to SMPC and difficult to circumvent or optimize effectively. To address this concern, we introduce a comprehensive PPI framework called SecFormer to achieve fast and accurate PPI for Transformer models. We successfully eliminate the high-cost exponential and maximum operations in PPI without sacrificing model performance and develop a suite of efficient SMPC protocols by employing suitable numerical computation methods to boost other complex nonlinear functions in PPI, including GeLU, LayerNorm, and a redesigned Softmax. Our extensive experiments reveal that SecFormer outperforms MPCFormer in performance, showing improvements of $3.4\%$ and $24.7\%$ for BERT${\text{BASE}}$ and BERT${\text{LARGE}}$, respectively. In terms of efficiency, SecFormer is 3.57 and 3.58 times faster than PUMA for BERT${\text{BASE}}$ and BERT${\text{LARGE}}$, demonstrating its effectiveness and speed.
- Chatgpt evaluation on sentence level relations: A focus on temporal, causal, and discourse relations. arXiv preprint arXiv:2304.14827.
- The-x: Privacy-preserving transformer inference with homomorphic encryption. In Findings of the Association for Computational Linguistics, pages 3510–3520.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Puma: Secure inference of llama-7b in five minutes. arXiv preprint arXiv:2307.12533.
- Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of the 33nd International Conference on Machine Learning, JMLR Workshop and Conference Proceedings, pages 201–210. JMLR.org.
- How to play any mental game or A completeness theorem for protocols with honest majority. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing, 1987, pages 218–229. ACM.
- Robert E Goldschmidt. 1964. Applications of division by convergence. In M.Sc dissertation, Massachusetts Institute of Technology.
- Iron: Private inference on transformers. Advances in Neural Information Processing Systems, 35:15718–15731.
- Cheetah: Lean and fast secure two-party deep neural network inference. In Proceedings of 31st USENIX Security Symposium, pages 809–826. USENIX Association.
- Crypten: Secure multi-party computation meets machine learning. Advances in Neural Information Processing Systems, 34:4961–4973.
- Does BERT pretrained on clinical notes reveal sensitive data? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, pages 946–959. Association for Computational Linguistics.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059. Association for Computational Linguistics.
- Mpcformer: fast, performant and private transformer inference with mpc. arXiv preprint arXiv:2211.01452.
- Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 619–631.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Peter Markstein. 2004a. Software division and square root using goldschmidt’s algorithms. In Proceedings of the 6th Conference on Real Numbers and Computers, pages 146–157. Citeseer.
- Peter W. Markstein. 2004b. Software division and square root using goldschmidt’s algorithms. In 6th Conference on Real Numbers and Computers, pages 146–157.
- Delphi: A cryptographic inference service for neural networks. In Proceedings of 29th USENIX Security Symposium, pages 2505–2522. USENIX Association.
- Payman Mohassel and Peter Rindal. 2018. Aby33{}^{\mbox{3}}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 35–52. ACM.
- Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In Proceedings of 2017 IEEE Symposium on Security and Privacy, pages 19–38. IEEE.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- Exploiting novel gpt-4 apis. arXiv preprint arXiv:2312.14302.
- Fábio Perez and Ian Ribeiro. 2022. Ignore previous prompt: Attack techniques for language models. CoRR, abs/2211.09527.
- Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
- Sirnn: A math library for secure rnn inference. In Proceedings of 2021 IEEE Symposium on Security and Privacy, pages 1003–1020. IEEE.
- Adi Shamir. 1979. How to share a secret. Communications of the ACM, 22(11):612–613.
- Cryptgpu: Fast privacy-preserving machine learning on the gpu. In Proceedings of 2021 IEEE Symposium on Security and Privacy, pages 1021–1038. IEEE.
- Attention is all you need. In Advances in Neural Information Processing Systems. Curran Associates, Inc.
- SecureNN: 3-Party secure computation for neural network training. Proceedings on Privacy Enhancing Technologies, pages 26–49.
- Falcon: Honest-majority maliciously secure framework for private deep learning. Proceedings on Privacy Enhancing Technologies, pages 188–208.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations. OpenReview.net.
- Characterization of mpc-based private inference for transformer-based models. In Proceedings of 2022 IEEE International Symposium on Performance Analysis of Systems and Software, pages 187–197. IEEE.
- Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In Annual Symposium on Foundations of Computer Science, pages 162–167.
- Mpcvit: Searching for mpc-friendly vision transformer with heterogeneous attention. arXiv preprint arXiv:2211.13955.