Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MERGE: Fast Private Text Generation (2305.15769v3)

Published 25 May 2023 in cs.CL and cs.AI

Abstract: The drastic increase in LLMs' parameters has led to a new trend of deploying models in cloud servers, raising growing concerns about private inference for Transformer-based models. Existing two-party privacy-preserving techniques, however, only take into account natural language understanding (NLU) scenarios. Private inference in natural language generation (NLG), crucial for applications like translation and code completion, remains underexplored.In addition, previous privacy-preserving techniques suffer from convergence issues during model training and exhibit poor inference speed when used with NLG models due to the neglect of time-consuming operations in auto-regressive generations. To address these issues, we propose a fast private text generation framework for Transformer-based LLMs, namely MERGE.MERGE reuses the output hidden state as the word embedding to bypass the embedding computation and reorganize the linear operations in the Transformer module to accelerate the forward procedure. Extensive experiments show that MERGE achieves a 26.5x speedup to the vanilla encrypted model under the sequence length 512, and reduces 80\% communication cost, with an up to 10x speedup to state-of-the-art approximated models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zi Liang (10 papers)
  2. Pinghui Wang (49 papers)
  3. Ruofei Zhang (24 papers)
  4. Nuo Xu (37 papers)
  5. Lifeng Xing (3 papers)
  6. Shuo Zhang (256 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com