Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The University of Edinburgh's Submissions to the WMT19 News Translation Task (1907.05854v1)

Published 12 Jul 2019 in cs.CL

Abstract: The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual LLM pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Rachel Bawden (25 papers)
  2. Nikolay Bogoychev (17 papers)
  3. Ulrich Germann (6 papers)
  4. Roman Grundkiewicz (16 papers)
  5. Faheem Kirefu (3 papers)
  6. Antonio Valerio Miceli Barone (9 papers)
  7. Alexandra Birch (67 papers)
Citations (32)