Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RescoreBERT: Discriminative Speech Recognition Rescoring with BERT (2202.01094v3)

Published 2 Feb 2022 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or $n$-best re-ranking. While pretraining with a masked LLM (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Liyan Xu (28 papers)
  2. Yile Gu (25 papers)
  3. Jari Kolehmainen (13 papers)
  4. Haidar Khan (21 papers)
  5. Ankur Gandhe (30 papers)
  6. Ariya Rastrow (55 papers)
  7. Andreas Stolcke (57 papers)
  8. Ivan Bulyko (23 papers)
Citations (44)