Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning from the Dictionary: Heterogeneous Knowledge Guided Fine-tuning for Chinese Spell Checking (2210.10320v1)

Published 19 Oct 2022 in cs.CL

Abstract: Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling errors. Recent researches start from the pretrained knowledge of LLMs and take multimodal information into CSC models to improve the performance. However, they overlook the rich knowledge in the dictionary, the reference book where one can learn how one character should be pronounced, written, and used. In this paper, we propose the LEAD framework, which renders the CSC model to learn heterogeneous knowledge from the dictionary in terms of phonetics, vision, and meaning. LEAD first constructs positive and negative samples according to the knowledge of character phonetics, glyphs, and definitions in the dictionary. Then a unified contrastive learning-based training scheme is employed to refine the representations of the CSC models. Extensive experiments and detailed analyses on the SIGHAN benchmark datasets demonstrate the effectiveness of our proposed methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yinghui Li (65 papers)
  2. Shirong Ma (23 papers)
  3. Qingyu Zhou (28 papers)
  4. Zhongli Li (11 papers)
  5. Li Yangning (2 papers)
  6. Shulin Huang (12 papers)
  7. Ruiyang Liu (15 papers)
  8. Chao Li (429 papers)
  9. Yunbo Cao (43 papers)
  10. Haitao Zheng (50 papers)
Citations (31)

Summary

We haven't generated a summary for this paper yet.