Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task (2109.09354v1)

Published 20 Sep 2021 in cs.CL

Abstract: This paper describes Charles University submission for Multilingual Low-Resource Translation for Indo-European Languages shared task at WMT21. We competed in translation from Catalan into Romanian, Italian and Occitan. Our systems are based on shared multilingual model. We show that using joint model for multiple similar language pairs improves upon translation quality in each pair. We also demonstrate that chararacter-level bilingual models are competitive for very similar language pairs (Catalan-Occitan) but less so for more distant pairs. We also describe our experiments with multi-task learning, where aside from a textual translation, the models are also trained to perform grapheme-to-phoneme conversion.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Josef Jon (12 papers)
  2. Michal Novák (8 papers)
  3. João Paulo Aires (6 papers)
  4. Dušan Variš (10 papers)
  5. Ondřej Bojar (91 papers)
Citations (3)