Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation (2004.02071v1)

Published 5 Apr 2020 in cs.CL

Abstract: We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resource setting. We propose a simple data augmentation technique to address both this shortcoming. We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences. This automatically expands the vocabulary of the model while maintaining high quality content. Our method shows an appreciable improvement in performance over strong baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sreyashi Nag (16 papers)
  2. Mihir Kale (18 papers)
  3. Varun Lakshminarasimhan (3 papers)
  4. Swapnil Singhavi (2 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.