Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Combining SMT and NMT Back-Translated Data for Efficient NMT (1909.03750v1)

Published 9 Sep 2019 in cs.CL

Abstract: Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Alberto Poncelas (15 papers)
  2. Dimitar Shterionov (16 papers)
  3. Gideon Maillette de Buy Wenniger (10 papers)
  4. Andy Way (46 papers)
  5. Maja Popovic (6 papers)
Citations (18)