Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space (2303.00456v3)

Published 1 Mar 2023 in cs.CL, cs.SD, and eess.AS

Abstract: Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained LLM and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Rao Ma (22 papers)
  2. Mark J. F. Gales (37 papers)
  3. Kate M. Knill (13 papers)
  4. Mengjie Qian (20 papers)
Citations (29)