Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses (2010.07574v1)

Published 15 Oct 2020 in cs.CL

Abstract: Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal LLM in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Simon Flachs (1 paper)
  2. Ophélie Lacroix (5 papers)
  3. Helen Yannakoudakis (32 papers)
  4. Marek Rei (52 papers)
  5. Anders Søgaard (120 papers)
Citations (23)