Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting (2309.09552v4)

Published 18 Sep 2023 in cs.AI and cs.CL

Abstract: The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yuang Li (18 papers)
  2. Yinglu Li (6 papers)
  3. Min Zhang (630 papers)
  4. Chang Su (37 papers)
  5. Mengxin Ren (16 papers)
  6. Xiaosong Qiao (5 papers)
  7. Miaomiao Ma (5 papers)
  8. Hao Yang (328 papers)
  9. Daimeng Wei (31 papers)
  10. Shimin Tao (31 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.