Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next (2212.04068v3)

Published 8 Dec 2022 in cs.CL and cs.AI

Abstract: While pre-trained Chinese LLMs have demonstrated impressive performance on a wide range of NLP tasks, the Chinese Spell Checking (CSC) task remains a challenge. Previous research has explored using information such as glyphs and phonetics to improve the ability to distinguish misspelled characters, with good results. However, the generalization ability of these models is not well understood: it is unclear whether they incorporate glyph-phonetic information and, if so, whether this information is fully utilized. In this paper, we aim to better understand the role of glyph-phonetic information in the CSC task and suggest directions for improvement. Additionally, we propose a new, more challenging, and practical setting for testing the generalizability of CSC models. All code is made publicly available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiaotian Zhang (35 papers)
  2. Yanjun Zheng (3 papers)
  3. Hang Yan (86 papers)
  4. Xipeng Qiu (257 papers)
Citations (5)