Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A spelling correction model for end-to-end speech recognition (1902.07178v1)

Published 19 Feb 2019 in eess.AS, cs.AI, cs.CL, cs.LG, and cs.SD

Abstract: Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, LLM (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the LLM component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6% relative improvement in WER over the baseline model when directly correcting top ASR hypothesis, and a 29.0% relative improvement when further rescoring an expanded n-best list using an external LM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jinxi Guo (15 papers)
  2. Tara N. Sainath (79 papers)
  3. Ron J. Weiss (30 papers)
Citations (135)