Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Unsupervised Pretraining Objectives for Machine Translation (2106.05634v1)

Published 10 Jun 2021 in cs.CL

Abstract: Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-LLMing (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We pretrain models with different methods on English$\leftrightarrow$German, English$\leftrightarrow$Nepali and English$\leftrightarrow$Sinhala monolingual data, and evaluate them on NMT. In (semi-) supervised NMT, varying the pretraining objective leads to surprisingly small differences in the finetuned performance, whereas unsupervised NMT is much more sensitive to it. To understand these results, we thoroughly study the pretrained models using a series of probes and verify that they encode and use information in different ways. We conclude that finetuning on parallel data is mostly sensitive to few properties that are shared by most models, such as a strong decoder, in contrast to unsupervised NMT that also requires models with strong cross-lingual abilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Christos Baziotis (13 papers)
  2. Ivan Titov (108 papers)
  3. Alexandra Birch (67 papers)
  4. Barry Haddow (59 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.