Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network (2305.12493v5)

Published 21 May 2023 in eess.AS, cs.CL, and cs.SD

Abstract: Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kaixun Huang (8 papers)
  2. Ao Zhang (45 papers)
  3. Zhanheng Yang (7 papers)
  4. Pengcheng Guo (55 papers)
  5. Bingshen Mu (8 papers)
  6. Tianyi Xu (39 papers)
  7. Lei Xie (337 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.