Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Multi-Branch Layers for On-Device Neural Machine Translation (2105.06679v2)

Published 14 May 2021 in cs.CL

Abstract: With the rapid development of AI, there is a trend in moving AI applications, such as neural machine translation (NMT), from cloud to mobile devices. Constrained by limited hardware resources and battery, the performance of on-device NMT systems is far from satisfactory. Inspired by conditional computation, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. As not all branches are activated during training, we propose shared-private reparameterization to ensure sufficient training for each branch. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English translation task over the Transformer model, respectively. Compared with a strong baseline that also uses multiple branches, the proposed method is up to 1.5 times faster with the same number of parameters.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhixing Tan (20 papers)
  2. Zeyuan Yang (8 papers)
  3. Meng Zhang (184 papers)
  4. Qun Liu (230 papers)
  5. Maosong Sun (337 papers)
  6. Yang Liu (2253 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.