Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition (1904.10045v1)

Published 27 Mar 2019 in eess.AS, cs.NE, and cs.SD

Abstract: Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external LLM by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by LLM will help to distinguish these substitution errors. In this work, we propose a transformer based spelling correction model to automatically correct errors especially the substitution errors made by CTC-based Mandarin speech recognition system. Specifically, we investigate using the recognition results generated by CTC-based systems as input and the ground-truth transcriptions as output to train a transformer with encoder-decoder architecture, which is much similar to machine translation. Results in a 20,000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3.41%, which results in 22.9% and 53.2% relative improvement compared to the baseline CTC-based systems decoded with and without LLM respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shiliang Zhang (132 papers)
  2. Ming Lei (52 papers)
  3. Zhijie Yan (33 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.