CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction (2407.00934v1)
Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute to revealing the critical characteristics and locating drawbacks of GEC systems. Evaluating systems by Combining these dimensions leads to high human consistency over other reference-based and reference-less metrics. Extensive experiments on 2 human judgement datasets and 6 reference datasets demonstrate the effectiveness and robustness of our method. All the codes will be released after the peer review.
- Jingheng Ye (15 papers)
- Zishan Xu (8 papers)
- Yinghui Li (65 papers)
- Xuxin Cheng (42 papers)
- Linlin Song (1 paper)
- Qingyu Zhou (28 papers)
- Hai-Tao Zheng (94 papers)
- Ying Shen (76 papers)
- Xin Su (67 papers)