Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PERT: A New Solution to Pinyin to Character Conversion Task (2205.11737v1)

Published 24 May 2022 in cs.CL and cs.AI

Abstract: Pinyin to Character conversion (P2C) task is the key task of Input Method Engine (IME) in commercial input software for Asian languages, such as Chinese, Japanese, Thai language and so on. It's usually treated as sequence labelling task and resolved by LLM, i.e. n-gram or RNN. However, the low capacity of the n-gram or RNN limits its performance. This paper introduces a new solution named PERT which stands for bidirectional Pinyin Encoder Representations from Transformers. It achieves significant improvement of performance over baselines. Furthermore, we combine PERT with n-gram under a Markov framework, and improve performance further. Lastly, the external lexicon is incorporated into PERT so as to resolve the OOD issue of IME.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jinghui Xiao (9 papers)
  2. Qun Liu (230 papers)
  3. Xin Jiang (242 papers)
  4. Yuanfeng Xiong (1 paper)
  5. Haiteng Wu (3 papers)
  6. Zhe Zhang (182 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.