Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Grammatical Error Correction with Machine Translation Pairs (1911.02825v2)

Published 7 Nov 2019 in cs.CL

Abstract: We propose a novel data synthesis method to generate diverse error-corrected sentence pairs for improving grammatical error correction, which is based on a pair of machine translation models of different qualities (i.e., poor and good). The poor translation model resembles the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammatical correctness, while the good translation model generally generates fluent and grammatically correct translations. We build the poor and good translation model with phrase-based statistical machine translation model with decreased LLM weight and neural machine translation model respectively. By taking the pair of their translations of the same sentences in a bridge language as error-corrected sentence pairs, we can construct unlimited pseudo parallel data. Our approach is capable of generating diverse fluency-improving patterns without being limited by the pre-defined rule set and the seed error-corrected data. Experimental results demonstrate the effectiveness of our approach and show that it can be combined with other synthetic data sources to yield further improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Wangchunshu Zhou (73 papers)
  2. Tao Ge (53 papers)
  3. Chang Mu (1 paper)
  4. Ke Xu (309 papers)
  5. Furu Wei (291 papers)
  6. Ming Zhou (182 papers)
Citations (37)

Summary

We haven't generated a summary for this paper yet.