Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Error-Robust Retrieval for Chinese Spelling Check (2211.07843v2)

Published 15 Nov 2022 in cs.CL

Abstract: Chinese Spelling Check (CSC) aims to detect and correct error tokens in Chinese contexts, which has a wide range of applications. However, it is confronted with the challenges of insufficient annotated data and the issue that previous methods may actually not fully leverage the existing datasets. In this paper, we introduce our plug-and-play retrieval method with error-robust information for Chinese Spelling Check (RERIC), which can be directly applied to existing CSC models. The datastore for retrieval is built completely based on the training data, with elaborate designs according to the characteristics of CSC. Specifically, we employ multimodal representations that fuse phonetic, morphologic, and contextual information in the calculation of query and key during retrieval to enhance robustness against potential errors. Furthermore, in order to better judge the retrieved candidates, the n-gram surrounding the token to be checked is regarded as the value and utilized for specific reranking. The experiment results on the SIGHAN benchmarks demonstrate that our proposed method achieves substantial improvements over existing work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  2. Spellgcn: Incorporating phonological and visual similarities into language models for chinese spelling check. arXiv preprint arXiv:2004.14166.
  3. Automatic spelling correction for resource-scarce languages using deep learning. In Proceedings of ACL 2018, Student Research Workshop, pages 146–152.
  4. Search engine guided neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  5. A spelling correction model for end-to-end speech recognition. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5651–5655. IEEE.
  6. Retrieval augmented language model pre-training. In International Conference on Machine Learning, pages 3929–3938. PMLR.
  7. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  8. Faspell: A fast, adaptable, simple, powerful chinese spell checker based on dae-decoder paradigm. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 160–169.
  9. Phmospell: Phonological and morphological knowledge guided chinese spelling check. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5958–5967.
  10. A rule based chinese spelling and grammar detection system utility. In 2012 International Conference on System Science and Engineering (ICSSE), pages 437–440. IEEE.
  11. Nora Kassner and Hinrich Schütze. 2020. Bert-knn: Adding a knn search component to pretrained language models for better qa. arXiv preprint arXiv:2005.00766.
  12. Nearest neighbor machine translation. arXiv preprint arXiv:2010.00710.
  13. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
  14. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110.
  15. The past mistake is the future wisdom: Error-driven contrastive probability optimization for chinese spell checking. arXiv preprint arXiv:2203.00991.
  16. Visually and phonologically similar characters in incorrect chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing (TALIP), 10(2):1–39.
  17. Plome: Pre-training with misspelled knowledge for chinese spelling correction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2991–3000.
  18. Lidia Mangu and Eric Brill. 1997. Automatic rule acquisition for spelling correction. In ICML, volume 97, pages 187–194. Citeseer.
  19. Domain-shift conditioning using adaptable filtering via hierarchical embeddings for robust chinese spell check. arXiv preprint arXiv:2008.12281.
  20. Introduction to sighan 2015 bake-off for chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pages 32–37.
  21. Attention is all you need. Advances in neural information processing systems, 30.
  22. Dynamic connected networks for chinese spelling check. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2437–2446.
  23. A hybrid approach to automatic corpus generation for chinese spelling check. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2517–2527.
  24. Confusionset-guided pointer networks for chinese spelling check. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5780–5785.
  25. Retrieve and refine: Improved sequence generation models for dialogue. arXiv preprint arXiv:1808.04776.
  26. Chinese spelling check evaluation at sighan bake-off 2013. In SIGHAN@ IJCNLP, pages 35–42. Citeseer.
  27. Bright Xu. 2019. Nlp chinese corpus: Large scale chinese corpus for nlp.
  28. Read, listen, and see: Leveraging multimodal information helps chinese spell checking. arXiv preprint arXiv:2105.12306.
  29. Junjie Yu and Zhenghua Li. 2014. Chinese spelling error detection and correction based on language model, pronunciation, and shape. In Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pages 220–223.
  30. Overview of sighan 2014 bake-off for chinese spelling check. In Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pages 126–132.
  31. Correcting chinese spelling errors with phonetic pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2250–2261.
  32. Spelling error correction with soft-masked bert. arXiv preprint arXiv:2005.07421.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xunjian Yin (17 papers)
  2. Xinyu Hu (32 papers)
  3. Jin Jiang (17 papers)
  4. Xiaojun Wan (99 papers)