A Chinese Spelling Check Framework Based on Reverse Contrastive Learning (2210.13823v2)
Abstract: Chinese spelling check is a task to detect and correct spelling mistakes in Chinese text. Existing research aims to enhance the text representation and use multi-source information to improve the detection and correction capabilities of models, but does not pay too much attention to improving their ability to distinguish between confusable words. Contrastive learning, whose aim is to minimize the distance in representation space between similar sample pairs, has recently become a dominant technique in natural language processing. Inspired by contrastive learning, we present a novel framework for Chinese spelling checking, which consists of three modules: language representation, spelling check and reverse contrastive learning. Specifically, we propose a reverse contrastive learning strategy, which explicitly forces the model to minimize the agreement between the similar examples, namely, the phonetically and visually confusable characters. Experimental results show that our framework is model-agnostic and could be combined with existing Chinese spelling check models to yield state-of-the-art performance.
- A new approach for automatic chinese spelling correction, in: Proceedings of Natural Language Processing Pacific Rim Symposium, Citeseer. pp. 278–283.
- A simple framework for contrastive learning of visual representations, in: Proceedings of the 37th International Conference on Machine Learning, JMLR.org.
- SpellGCN: Incorporating phonological and visual similarities into language models for Chinese spelling check, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online. pp. 871–881. URL: https://aclanthology.org/2020.acl-main.81, doi:10.18653/v1/2020.acl-main.81.
- Pre-training with whole word masking for chinese bert. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 29, 3504–3514. URL: https://doi.org/10.1109/TASLP.2021.3124365, doi:10.1109/TASLP.2021.3124365.
- BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota. pp. 4171–4186. URL: https://aclanthology.org/N19-1423, doi:10.18653/v1/N19-1423.
- Automatically build corpora for chinese spelling check based on the input method, in: Tang, J., Kan, M.Y., Zhao, D., Li, S., Zan, H. (Eds.), Natural Language Processing and Chinese Computing, Springer International Publishing, Cham. pp. 471–485.
- SimCSE: Simple contrastive learning of sentence embeddings, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. pp. 6894–6910. URL: https://aclanthology.org/2021.emnlp-main.552, doi:10.18653/v1/2021.emnlp-main.552.
- Data-efficient image recognition with contrastive predictive coding, in: International conference on machine learning, PMLR. pp. 4182–4192.
- FASPell: A fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm, in: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), Association for Computational Linguistics, Hong Kong, China. pp. 160–169. URL: https://aclanthology.org/D19-5522, doi:10.18653/v1/D19-5522.
- SpellBERT: A lightweight pretrained model for Chinese spelling check, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. pp. 3544–3551. URL: https://aclanthology.org/2021.emnlp-main.287, doi:10.18653/v1/2021.emnlp-main.287.
- CLASSIC: Continual and contrastive learning of aspect sentiment classification tasks, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. pp. 6871–6883. URL: https://aclanthology.org/2021.emnlp-main.550, doi:10.18653/v1/2021.emnlp-main.550.
- Supervised contrastive learning, in: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 18661–18673. URL: https://proceedings.neurips.cc/paper/2020/file/d89a66c7c80a29b1bdbab0f2a1a94af8-Paper.pdf.
- Improving chinese spelling check by character pronunciation prediction: The effects of adaptivity and granularity, in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Association for Computational Linguistics. pp. 4275–4286. URL: https://aclanthology.org/2022.emnlp-main.287.
- Learning from the dictionary: Heterogeneous knowledge guided fine-tuning for chinese spell checking, in: Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Association for Computational Linguistics. pp. 238–249. URL: https://aclanthology.org/2022.findings-emnlp.18.
- The past mistake is the future wisdom: Error-driven contrastive probability optimization for Chinese spell checking, in: Findings of the Association for Computational Linguistics: ACL 2022, Association for Computational Linguistics, Dublin, Ireland. pp. 3202–3213. URL: https://aclanthology.org/2022.findings-acl.252, doi:10.18653/v1/2022.findings-acl.252.
- Duke: Distance fusion and knowledge enhanced framework for chinese spelling check, in: 2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology (FCSIT), pp. 1–5. doi:10.1109/FCSIT57414.2022.00012.
- Disentangled phonetic representation for chinese spelling correction abs/2305.14783. URL: https://doi.org/10.48550/arXiv.2305.14783, doi:10.48550/arXiv.2305.14783, arXiv:2305.14783.
- Visually and phonologically similar characters in incorrect simplified Chinese words, in: Coling 2010: Posters, Coling 2010 Organizing Committee, Beijing, China. pp. 739--747. URL: https://aclanthology.org/C10-2085.
- PLOME: Pre-training with misspelled knowledge for Chinese spelling correction, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online. pp. 2991--3000. URL: https://aclanthology.org/2021.acl-long.233, doi:10.18653/v1/2021.acl-long.233.
- A hybrid Chinese spelling correction using language model and statistical machine translation with reranking, in: Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, Asian Federation of Natural Language Processing, Nagoya, Japan. pp. 54--58. URL: https://aclanthology.org/W13-4409.
- Improving chinese spell checking with bidirectional lstms and confusionset-based decision network. Neural Comput. Appl. 35, 15679--15692. URL: https://doi.org/10.1007/s00521-023-08570-5, doi:10.1007/s00521-023-08570-5.
- Chinese. Cambridge University Press.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 .
- Not all negatives are equal: Label-aware contrastive loss for fine-grained text classification, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. pp. 4381--4394. URL: https://aclanthology.org/2021.emnlp-main.359, doi:10.18653/v1/2021.emnlp-main.359.
- Introduction to SIGHAN 2015 bake-off for Chinese spelling check, in: Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, Association for Computational Linguistics, Beijing, China. pp. 32--37. URL: https://aclanthology.org/W15-3106, doi:10.18653/v1/W15-3106.
- Neural automated essay scoring incorporating handcrafted features, in: Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, Barcelona, Spain (Online). pp. 6077--6088. URL: https://aclanthology.org/2020.coling-main.535, doi:10.18653/v1/2020.coling-main.535.
- Dynamic connected networks for Chinese spelling check, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, Online. pp. 2437--2446. URL: https://aclanthology.org/2021.findings-acl.216, doi:10.18653/v1/2021.findings-acl.216.
- A hybrid approach to automatic corpus generation for Chinese spelling check, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium. pp. 2517--2527. URL: https://aclanthology.org/D18-1273, doi:10.18653/v1/D18-1273.
- Rethinking masked language modeling for chinese spelling correction abs/2305.17721. URL: https://doi.org/10.48550/arXiv.2305.17721, doi:10.48550/arXiv.2305.17721, arXiv:2305.17721.
- Chinese spelling check evaluation at SIGHAN bake-off 2013, in: Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, Asian Federation of Natural Language Processing, Nagoya, Japan. pp. 35--42. URL: https://aclanthology.org/W13-4406.
- Unsupervised feature learning via non-parametric instance discrimination, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733--3742.
- An improved graph model for Chinese spell checking, in: Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, Association for Computational Linguistics, Wuhan, China. pp. 157--166. URL: https://aclanthology.org/W14-6825, doi:10.3115/v1/W14-6825.
- Read, listen, and see: Leveraging multimodal information helps Chinese spell checking, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, Online. pp. 716--728. URL: https://aclanthology.org/2021.findings-acl.64, doi:10.18653/v1/2021.findings-acl.64.
- Block the label and noise: An n-gram masked speller for chinese spell checking abs/2305.03314. URL: https://doi.org/10.48550/arXiv.2305.03314, doi:10.48550/arXiv.2305.03314, arXiv:2305.03314.
- Overview of SIGHAN 2014 bake-off for Chinese spelling check, in: Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, Association for Computational Linguistics, Wuhan, China. pp. 126--132. URL: https://aclanthology.org/W14-6820, doi:10.3115/v1/W14-6820.
- Contextual similarity is more valuable than character similarity: An empirical study for chinese spell checking, in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1--5. doi:10.1109/ICASSP49357.2023.10095675.
- SDCL: self-distillation contrastive learning for chinese spell checking. CoRR abs/2210.17168. URL: https://doi.org/10.48550/arXiv.2210.17168, doi:10.48550/arXiv.2210.17168, arXiv:2210.17168.
- Investigating glyph phonetic information for chinese spell checking: What works and what’s next abs/2212.04068. URL: https://doi.org/10.48550/arXiv.2212.04068, doi:10.48550/arXiv.2212.04068, arXiv:2212.04068.
- Nankai Lin (21 papers)
- Hongyan Wu (24 papers)
- Sihui Fu (2 papers)
- Shengyi Jiang (24 papers)
- Aimin Yang (13 papers)