Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN (2405.18744v1)

Published 29 May 2024 in cs.CR

Abstract: The emergence of ChatGPT marks the arrival of the LLM era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed to protect both the privacy of the model parameters and user queries. However, they require gigabytes of data transfer and several minutes to generate just one token, making them impractical for most real-world applications. To improve the efficiency of private LLM inference, we propose PermLLM, which accelerates the evaluation of non-linear functions using secure random permutation. Along with the optimized secret sharing protocols and homomorphic encryption, PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token, under a realistic network setting (10ms RTT and 1Gbps bandwidth), which is magnitudes faster than existing MPC solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Donald Beaver. Efficient multiparty protocols using circuit randomization. In Advances in Cryptology—CRYPTO’91: Proceedings 11, pages 420–432. Springer, 1992.
  2. Zvika Brakerski. Fully homomorphic encryption without modulus switching from classical gapsvp. In Reihaneh Safavi-Naini and Ran Canetti, editors, Advances in Cryptology - CRYPTO 2012 - 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings, volume 7417 of Lecture Notes in Computer Science, pages 868–886. Springer, 2012.
  3. Secret-shared shuffle. In Shiho Moriai and Huaxiong Wang, editors, Advances in Cryptology - ASIACRYPT 2020 - 26th International Conference on the Theory and Application of Cryptology and Information Security, Daejeon, South Korea, December 7-11, 2020, Proceedings, Part III, volume 12493 of Lecture Notes in Computer Science, pages 342–372. Springer, 2020.
  4. Homomorphic encryption for arithmetic of approximate numbers. In Tsuyoshi Takagi and Thomas Peyrin, editors, Advances in Cryptology - ASIACRYPT 2017 - 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, December 3-7, 2017, Proceedings, Part I, volume 10624 of Lecture Notes in Computer Science, pages 409–437. Springer, 2017.
  5. ABY - A framework for efficient mixed-protocol secure two-party computation. In 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, California, USA, February 8-11, 2015. The Internet Society, 2015.
  6. Puma: Secure inference of llama-7b in five minutes. arXiv preprint arXiv:2307.12533, 2023.
  7. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  8. Unsplit: Data-oblivious model inversion, model stealing, and label inference attacks against split learning. In Yuan Hong and Lingyu Wang, editors, Proceedings of the 21st Workshop on Privacy in the Electronic Society, WPES2022, Los Angeles, CA, USA, 7 November 2022, pages 115–124. ACM, 2022.
  9. Somewhat practical fully homomorphic encryption. IACR Cryptol. ePrint Arch., page 144, 2012.
  10. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International conference on machine learning, pages 201–210. PMLR, 2016.
  11. Secure pac bayesian regression via real shamir secret sharing. In 2023 European Control Conference (ECC), pages 1–6. IEEE, 2023.
  12. Iron: Private inference on transformers. Advances in Neural Information Processing Systems, 35:15718–15731, 2022.
  13. Ciphergpt: Secure two-party gpt inference. Cryptology ePrint Archive, 2023.
  14. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  15. Pyfhel: Python for homomorphic encryption libraries. In Proceedings of the 9th on Workshop on Encrypted Computing & Applied Homomorphic Cryptography, pages 11–16, 2021.
  16. {{\{{GAZELLE}}\}}: A low latency framework for secure neural network inference. In 27th USENIX Security Symposium (USENIX Security 18), pages 1651–1669, 2018.
  17. Crypten: Secure multi-party computation meets machine learning. Advances in Neural Information Processing Systems, 34:4961–4973, 2021.
  18. Mpcformer: Fast, performant and private transformer inference with mpc. In The Eleventh International Conference on Learning Representations, 2022.
  19. Pp-stream: Toward high-performance privacy-preserving neural network inference via distributed stream processing. Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE 2024), 2024.
  20. Xpir: Private information retrieval for everyone. Proceedings on Privacy Enhancing Technologies, pages 155–174, 2016.
  21. Aby3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 35–52, 2018.
  22. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pages 19–38. IEEE Computer Society, 2017.
  23. Text embeddings reveal (almost) as much as text. In EMNLP 2023, 2023.
  24. OpenAI. Introducing chatgpt. OpenAI blog, 2022.
  25. Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In Jacques Stern, editor, Advances in Cryptology - EUROCRYPT ’99, International Conference on the Theory and Application of Cryptographic Techniques, Prague, Czech Republic, May 2-6, 1999, Proceeding, volume 1592 of Lecture Notes in Computer Science, pages 223–238. Springer, 1999.
  26. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  27. Cryptflow2: Practical 2-party secure inference. In CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020, pages 325–342. ACM, 2020.
  28. Chameleon: A hybrid secure computation framework for machine learning applications. In Proceedings of the 2018 on Asia conference on computer and communications security, pages 707–721, 2018.
  29. Deepsecure: Scalable provably-secure deep learning. In Proceedings of the 55th annual design automation conference, pages 1–6, 2018.
  30. Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612–613, 1979.
  31. Privacy in distributed computations based on real number secret sharing. arXiv preprint arXiv:2107.00911, 2021.
  32. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  33. Split learning for health: Distributed deep learning without sharing raw patient data. CoRR, abs/1812.00564, 2018.
  34. Securenn: 3-party secure computation for neural network training. Proc. Priv. Enhancing Technol., 2019(3):26–49, 2019.
  35. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023.
  36. Shuffled transformer for privacy-preserving split learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
  37. Andrew Chi-Chih Yao. How to generate and exchange secrets. In 27th annual symposium on foundations of computer science (Sfcs 1986), pages 162–167. IEEE, 1986.
  38. Single-database private information retrieval from fully homomorphic encryption. IEEE Transactions on Knowledge and Data Engineering, 25(5):1125–1134, 2013.
  39. GLM-130B: an open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  40. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  41. Fei Zheng. Input reconstruction attack against vertical federated large language models. arXiv preprint arXiv:2311.07585, 2023.
  42. Towards secure and practical machine learning via secret sharing and random permutation. Knowledge-Based Systems, 245:108609, 2022.
  43. Cheetah: Lean and fast secure Two-Party deep neural network inference. In 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, August 2022. USENIX Association.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com