Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Machine Translation of Text from Non-Native Speakers (1808.06267v2)

Published 19 Aug 2018 in cs.CL

Abstract: Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.5 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Antonios Anastasopoulos (111 papers)
  2. Alison Lui (3 papers)
  3. Toan Nguyen (32 papers)
  4. David Chiang (59 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.