Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MTCSC: Retrieval-Augmented Iterative Refinement for Chinese Spelling Correction (2504.18938v1)

Published 26 Apr 2025 in cs.CL

Abstract: Chinese Spelling Correction (CSC) aims to detect and correct erroneous tokens in sentences. While LLMs have shown remarkable success in identifying and rectifying potential errors, they often struggle with maintaining consistent output lengths and adapting to domain-specific corrections. Furthermore, existing CSC task impose rigid constraints requiring input and output lengths to be identical, limiting their applicability. In this work, we extend traditional CSC to variable-length correction scenarios, including Chinese Splitting Error Correction (CSEC) and ASR N-best Error Correction. To address domain adaptation and length consistency, we propose MTCSC (Multi-Turn CSC) framework based on RAG enhanced with a length reflection mechanism. Our approach constructs a retrieval database from domain-specific training data and dictionaries, fine-tuning retrievers to optimize performance for error-containing inputs. Additionally, we introduce a multi-source combination strategy with iterative length reflection to ensure output length fidelity. Experiments across diverse domain datasets demonstrate that our method significantly outperforms current approaches in correction quality, particularly in handling domain-specific and variable-length error correction tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Junhong Liang (3 papers)
  2. Yu Zhou (335 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets