Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement (1902.05770v1)

Published 15 Feb 2019 in cs.CL and cs.AI

Abstract: With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation. However, most of the previous methods combine layers in a static fashion in that their aggregation strategy is independent of specific hidden states. Inspired by recent progress on capsule networks, in this paper we propose to use routing-by-agreement strategies to aggregate layers dynamically. Specifically, the algorithm learns the probability of a part (individual layer representations) assigned to a whole (aggregated representations) in an iterative way and combines parts accordingly. We implement our algorithm on top of the state-of-the-art neural machine translation model TRANSFORMER and conduct experiments on the widely-used WMT14 English-German and WMT17 Chinese-English translation datasets. Experimental results across language pairs show that the proposed approach consistently outperforms the strong baseline model and a representative static aggregation model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zi-Yi Dou (33 papers)
  2. Zhaopeng Tu (135 papers)
  3. Xing Wang (191 papers)
  4. Longyue Wang (87 papers)
  5. Shuming Shi (126 papers)
  6. Tong Zhang (569 papers)
Citations (54)

Summary

We haven't generated a summary for this paper yet.