N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space (2303.00456v3)

Published 1 Mar 2023 in cs.CL, cs.SD, and eess.AS

Abstract: Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained LLM and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Rao Ma (22 papers)
Mark J. F. Gales (37 papers)
Kate M. Knill (13 papers)
Mengjie Qian (20 papers)

Citations (29)

View on Semantic Scholar

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space (2303.00456v3)

Related Papers