2000 character limit reached
The University of Helsinki submissions to the WMT19 news translation task (1906.04040v1)
Published 10 Jun 2019 in cs.CL
Abstract: In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish.
- Aarne Talman (8 papers)
- Umut Sulubacak (4 papers)
- Raúl Vázquez (12 papers)
- Yves Scherrer (10 papers)
- Sami Virpioja (10 papers)
- Alessandro Raganato (14 papers)
- Arvi Hurskainen (1 paper)
- Jörg Tiedemann (41 papers)