Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition (1904.10045v1)

Published 27 Mar 2019 in eess.AS, cs.NE, and cs.SD

Abstract: Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external LLM by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by LLM will help to distinguish these substitution errors. In this work, we propose a transformer based spelling correction model to automatically correct errors especially the substitution errors made by CTC-based Mandarin speech recognition system. Specifically, we investigate using the recognition results generated by CTC-based systems as input and the ground-truth transcriptions as output to train a transformer with encoder-decoder architecture, which is much similar to machine translation. Results in a 20,000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3.41%, which results in 22.9% and 53.2% relative improvement compared to the baseline CTC-based systems decoded with and without LLM respectively.

Authors (3)

Shiliang Zhang (132 papers)
Ming Lei (52 papers)
Zhijie Yan (33 papers)

Citations (15)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition (1904.10045v1)

Summary

Related Papers