Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation (2203.08442v1)

Published 16 Mar 2022 in cs.CL and cs.AI

Abstract: In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Wenxuan Wang (128 papers)
  2. Wenxiang Jiao (44 papers)
  3. Yongchang Hao (11 papers)
  4. Xing Wang (191 papers)
  5. Shuming Shi (126 papers)
  6. Zhaopeng Tu (135 papers)
  7. Michael Lyu (27 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.