Mixture of Small and Large Models for Chinese Spelling Check (2506.06887v1)

Published 7 Jun 2025 in cs.CL

Abstract: In the era of LLMs, the Chinese Spelling Check (CSC) task has seen various LLM methods developed, yet their performance remains unsatisfactory. In contrast, fine-tuned BERT-based models, relying on high-quality in-domain data, show excellent performance but suffer from edit pattern overfitting. This paper proposes a novel dynamic mixture approach that effectively combines the probability distributions of small models and LLMs during the beam search decoding phase, achieving a balanced enhancement of precise corrections from small models and the fluency of LLMs. This approach also eliminates the need for fine-tuning LLMs, saving significant time and resources, and facilitating domain adaptation. Comprehensive experiments demonstrate that our mixture approach significantly boosts error correction capabilities, achieving state-of-the-art results across multiple datasets. Our code is available at https://github.com/zhqiao-nlp/MSLLM.

PDF Abstract

Mixture of Small and Large Models for Chinese Spelling Check

The article titled "Mixture of Small and Large Models for Chinese Spelling Check" addresses the nuances of the Chinese Spelling Check (CSC) task in the field of NLP. It explores the limitations of current methodologies relying on LLMs and fine-tuned BERT-based models while offering a novel hybrid approach that amalgamates these strategies to enhance CSC performance.

Core Premises and Methodology

The CSC task aims to rectify spelling errors in Chinese text, which are notorious for impacting comprehension and several downstream applications like machine translation and information retrieval. Traditionally, fine-tuning BERT-based models with domain-specific data has been successful; however, they often suffer from overfitting due to memorizing specific edit patterns and failing to adapt across domains effectively. On the other hand, while LLMs show promise in cross-domain generalization thanks to their extensive training on large datasets, they frequently over-polish text for fluency, leading to inconsistencies in input-output length.

The paper introduces a dynamic mixture method that merges the strengths of BERT-based models and LLMs. It strategically integrates the probability distributions from both models during the beam search decoding phase. This method enhances the precision of corrections from the small BERT-based models and the fluency from LLMs, obviating the need for dedicated fine-tuning of LLMs. Thus, it addresses the inherent shortcomings of each single-method approach and facilitates efficient domain adaptation.

Experimental Results

The presented methodology was tested across various mainstream datasets, demonstrating substantial improvements in error correction capabilities. Results indicated state-of-the-art (SOTA) performance in several benchmarks, including rSIGHANs, CSCD-NS, MCSCSet, ECSpell, and LEMON datasets. Notably, the approach shows an average increase of 10.4% in sentence-level $F_1$ scores across LEMON's diverse domains and a 4.8% improvement in ECSpell datasets. These results testify to the hybrid model's robustness and efficiency, showcasing lower false positive rates and better recall metrics without sacrificing precision.

Implications and Future Direction

The implications of this research span both practical and theoretical domains within CSC tasks and wider NLP applications. Practically, the dynamic mixture of models promises reduced computational costs and increased efficiency by eliminating the need for LLM fine-tuning. Theoretically, it lays a foundation for exploring similar hybrid approaches in other language-based error detection systems. By leveraging the complementary properties of distinct model architectures, it paves the way for more generalized language correction frameworks.

Looking forward, areas such as adapting this mixture method to other languages or tasks like grammatical error correction offer promising research directions. Additionally, addressing conflicts between the presented approach and techniques like in-context learning (ICL) can refine its applicability and extend its utility across conversational and generative LLMs.

In conclusion, the paper's hybrid methodology represents a significant advancement in CSC, providing a balanced solution to the precision-fluency dichotomy inherent in contemporary models. This fusion of capabilities heralds a transformative potential for broader NLP applications, enhancing language understanding and processing on multiple fronts.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Ziheng Qiao (2 papers)
Houquan Zhou (12 papers)
Zhenghua Li (38 papers)

Mixture of Small and Large Models for Chinese Spelling Check (2506.06887v1)