Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data (2105.15071v2)
Abstract: The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a low-resource language with only monolingual data, in addition to any parallel data in the related high-resource language. Our method, NMT-Adapt, combines denoising autoencoding, back-translation and adversarial objectives to utilize monolingual data for low-resource adaptation. We experiment on 7 languages from three different language families and show that our technique significantly improves translation into low-resource language compared to other translation baselines.
- Wei-Jen Ko (11 papers)
- Ahmed El-Kishky (25 papers)
- Adithya Renduchintala (17 papers)
- Vishrav Chaudhary (45 papers)
- Naman Goyal (37 papers)
- Francisco Guzmán (39 papers)
- Pascale Fung (150 papers)
- Philipp Koehn (60 papers)
- Mona Diab (71 papers)