The University of Edinburgh's Submissions to the WMT19 News Translation Task (1907.05854v1)
Abstract: The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual LLM pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.
- Rachel Bawden (25 papers)
- Nikolay Bogoychev (17 papers)
- Ulrich Germann (6 papers)
- Roman Grundkiewicz (16 papers)
- Faheem Kirefu (3 papers)
- Antonio Valerio Miceli Barone (9 papers)
- Alexandra Birch (67 papers)