Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network (2305.12493v5)

Published 21 May 2023 in eess.AS, cs.CL, and cs.SD

Abstract: Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.

Authors (7)

Kaixun Huang (8 papers)
Ao Zhang (45 papers)
Zhanheng Yang (7 papers)
Pengcheng Guo (55 papers)
Bingshen Mu (8 papers)
Tianyi Xu (39 papers)
Lei Xie (337 papers)

Citations (16)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network (2305.12493v5)

Summary

Related Papers