Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KNPTC: Knowledge and Neural Machine Translation Powered Chinese Pinyin Typo Correction (1805.00741v1)

Published 2 May 2018 in cs.CL and cs.AI

Abstract: Chinese pinyin input methods are very important for Chinese language processing. Actually, users may make typos inevitably when they input pinyin. Moreover, pinyin typo correction has become an increasingly important task with the popularity of smartphones and the mobile Internet. How to exploit the knowledge of users typing behaviors and support the typo correction for acronym pinyin remains a challenging problem. To tackle these challenges, we propose KNPTC, a novel approach based on neural machine translation (NMT). In contrast to previous work, KNPTC is able to integrate explicit knowledge into NMT for pinyin typo correction, and is able to learn to correct a variety of typos without the guidance of manually selected constraints or languagespecific features. In this approach, we first obtain the transition probabilities between adjacent letters based on large-scale real-life datasets. Then, we construct the "ground-truth" alignments of training sentence pairs by utilizing these probabilities. Furthermore, these alignments are integrated into NMT to capture sensible pinyin typo correction patterns. KNPTC is applied to correct typos in real-life datasets, which achieves 32.77% increment on average in accuracy rate of typo correction compared against the state-of-the-art system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hengyi Cai (20 papers)
  2. Xingguang Ji (6 papers)
  3. Yonghao Song (13 papers)
  4. Yan Jin (35 papers)
  5. Yang Zhang (1129 papers)
  6. Mairgup Mansur (3 papers)
  7. Xiaofang Zhao (14 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.