Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LM-Critic: Language Models for Unsupervised Grammatical Error Correction (2109.06822v2)

Published 14 Sep 2021 in cs.CL and cs.LG

Abstract: Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive. Recently, the Break-It-Fix-It (BIFI) framework has demonstrated strong results on learning to repair a broken program without any labeled examples, but this relies on a perfect critic (e.g., a compiler) that returns whether an example is valid or not, which does not exist for the GEC task. In this work, we show how to leverage a pretrained LLM (LM) in defining an LM-Critic, which judges a sentence to be grammatical if the LM assigns it a higher probability than its local perturbations. We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector. We evaluate our approach on GEC datasets across multiple domains (CoNLL-2014, BEA-2019, GMEG-wiki and GMEG-yahoo) and show that it outperforms existing methods in both the unsupervised setting (+7.7 F0.5) and the supervised setting (+0.5 F0.5).

Overview of LM-Critic: LLMs for Unsupervised Grammatical Error Correction

This paper presents a novel approach to the task of grammatical error correction (GEC) by utilizing LLMs (LMs) as critics, a method referred to as LM-Critic. The proposed method aims to address the challenges associated with obtaining labeled data for GEC, by leveraging pretrained LMs to form an unsupervised framework for grammatical error identification and correction.

Methodology

The central component of this approach is the LM-Critic, which assesses the grammaticality of sentences based on the probabilities assigned by a pretrained LLM, such as GPT-2. The LM-Critic operates on the principle that a grammatical sentence should have a higher probability than its local perturbations—a concept termed the local optimum criterion. By employing this criterion, LM-Critic serves as a cost-effective means to bootstrap realistic ungrammatical-grammatical sentence pairs from unlabeled data for training GEC models.

The LM-Critic methodology is complemented by the Break-It-Fix-It (BIFI) framework, which iteratively enhances the training data by augmenting it with naturally occurring errors sourced from real-world datasets. The BIFI framework adapts and refines both the corrector (fixer) and the generator of errors (breaker) using iterative improvements guided by the LM-Critic's assessments, thereby achieving more realistic training data without direct human annotation.

Results

The efficacy of the LM-Critic approach is demonstrated through evaluation on several GEC datasets, namely CoNLL-2014, BEA-2019, GMEG-wiki, and GMEG-yahoo. In both unsupervised and supervised settings, the approach outperforms traditional methods relying on synthetic data. Specifically, the framework shows a notable improvement in the unsupervised setting, with an average gain of +7.7 in F0.5 score across evaluated datasets compared to baselines trained solely on synthetic data. In the supervised setting, where labeled data is available, the framework still provides a measurable improvement of +0.5 in F0.5 over existing state-of-the-art systems such as GECToR.

Implications and Future Directions

The implications of LM-Critic are significant for the field of natural language processing, specifically in domains and languages where labeled GEC data are scarce. This work suggests a paradigm shift in GEC training, moving towards leveraging the expansive capabilities of pretrained LMs in generating and assessing grammatical data, effectively reducing dependency on expensive labeled resources.

There is potential for further research into the optimization of neighborhood perturbations and enhanced integration of LM-based critiques with various GEC frameworks. The interplay between critic-based evaluations and data generation remains fertile ground for exploration, potentially leading to even more robust models for linguistic tasks.

Conclusion

The paper represents a significant stride in the utilization of unsupervised learning paradigms for grammatical error correction by marrying it with the capabilities of large pretrained LLMs. LM-Critic exemplifies how nuanced insights from LLMs can be concretely applied to improve both the qualitative and quantitative aspects of GEC datasets, ultimately advancing the fidelity and applicability of GEC systems in real-world applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Michihiro Yasunaga (48 papers)
  2. Jure Leskovec (233 papers)
  3. Percy Liang (239 papers)
Citations (46)