Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decoupling recognition and transcription in Mandarin ASR (2108.01129v1)

Published 2 Aug 2021 in cs.CL, cs.SD, and eess.AS

Abstract: Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiahong Yuan (12 papers)
  2. Xingyu Cai (10 papers)
  3. Dongji Gao (8 papers)
  4. Renjie Zheng (29 papers)
  5. Liang Huang (108 papers)
  6. Kenneth Church (21 papers)
Citations (9)