Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task (2307.03972v1)

Published 8 Jul 2023 in cs.CL

Abstract: Large-scale LLMs has shown remarkable capability in various of NLP tasks and attracted lots of attention recently. However, some studies indicated that LLMs fail to achieve promising result beyond the state-of-the-art models in English grammatical error correction (GEC) tasks. In this report, we aim to explore the how LLMs perform on Chinese grammatical error correction tasks and provide guidance for future work. We conduct experiments with 3 different LLMs of different model scale on 4 Chinese GEC dataset. Our experimental results indicate that the performances of LLMs on automatic evaluation metrics falls short of the previous sota models because of the problem of over-correction. Furthermore, we also discover notable variations in the performance of LLMs when evaluated on different data distributions. Our findings demonstrates that further investigation is required for the application of LLMs on Chinese GEC task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Palm 2 technical report. CoRR, abs/2305.10403.
  2. Daniel Dahlmeier and Hwee Tou Ng. 2012. Better evaluation for grammatical error correction. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 3-8, 2012, Montréal, Canada, pages 568–572. The Association for Computational Linguistics.
  3. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
  4. Is chatgpt a highly fluent grammatical error correction system? A comprehensive evaluation. CoRR, abs/2304.01746.
  5. Is chatgpt A good translator? A preliminary study. CoRR, abs/2301.08745.
  6. Are chatgpt and GPT-4 general-purpose solvers for financial text analytics? an examination on several typical tasks. CoRR, abs/2305.05862.
  7. Linguistic rules-based corpus generation for native chinese grammatical error correction. In Findings of the Association for Computational Linguistics: EMNLP 2022.
  8. Training language models to follow instructions with human feedback. In NeurIPS.
  9. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  10. Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark. CoRR, abs/2303.13648.
  11. FCGEC: fine-grained corpus for chinese grammatical error correction. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 1900–1918. Association for Computational Linguistics.
  12. Exploring the limits of chatgpt for query or aspect-based text summarization. CoRR, abs/2302.08081.
  13. Mucgec: a multi-reference multi-source evaluation dataset for chinese grammatical error correction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 3118–3130. Association for Computational Linguistics.
  14. Overview of the NLPCC 2018 shared task: Grammatical error correction. In Natural Language Processing and Chinese Computing - 7th CCF International Conference, NLPCC 2018, Hohhot, China, August 26-30, 2018, Proceedings, Part II, volume 11109 of Lecture Notes in Computer Science, pages 439–445. Springer.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Fanyi Qu (7 papers)
  2. Yunfang Wu (50 papers)
Citations (5)