2000 character limit reached
Decoupling recognition and transcription in Mandarin ASR (2108.01129v1)
Published 2 Aug 2021 in cs.CL, cs.SD, and eess.AS
Abstract: Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far.
- Jiahong Yuan (12 papers)
- Xingyu Cai (10 papers)
- Dongji Gao (8 papers)
- Renjie Zheng (29 papers)
- Liang Huang (108 papers)
- Kenneth Church (21 papers)