Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction (2311.08219v1)

Published 14 Nov 2023 in cs.CL and cs.AI

Abstract: ChatGPT has demonstrated impressive performance in various downstream tasks. However, in the Chinese Spelling Correction (CSC) task, we observe a discrepancy: while ChatGPT performs well under human evaluation, it scores poorly according to traditional metrics. We believe this inconsistency arises because the traditional metrics are not well-suited for evaluating generative models. Their overly strict length and phonics constraints may lead to underestimating ChatGPT's correction capabilities. To better evaluate generative models in the CSC task, this paper proposes a new evaluation metric: Eval-GCSC. By incorporating word-level and semantic similarity judgments, it relaxes the stringent length and phonics constraints. Experimental results show that Eval-GCSC closely aligns with human evaluations. Under this metric, ChatGPT's performance is comparable to traditional token-level classification models (TCM), demonstrating its potential as a CSC tool. The source code and scripts can be accessed at https://github.com/ktlKTL/Eval-GCSC.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kunting Li (2 papers)
  2. Yong Hu (116 papers)
  3. Shaolei Wang (4 papers)
  4. Hanhan Ma (2 papers)
  5. Liang He (202 papers)
  6. Fandong Meng (174 papers)
  7. Jie Zhou (687 papers)