Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Discriminative Speech Recognition Rescoring with Pre-trained Language Models (2310.06248v1)

Published 10 Oct 2023 in eess.AS

Abstract: Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. LLMs have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative training, directly optimizing the minimum word-error-rate (MWER) criterion typically improves rescoring. In this study, we propose and explore several discriminative fine-tuning schemes for pre-trained LMs. We propose two architectures based on different pooling strategies of output embeddings and compare with probability based MWER. We conduct detailed comparisons between pre-trained causal and bidirectional LMs in discriminative settings. Experiments on LibriSpeech demonstrate that all MWER training schemes are beneficial, giving additional gains upto 8.5\% WER. Proposed pooling variants achieve lower latency while retaining most improvements. Finally, our study concludes that bidirectionality is better utilized with discriminative training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Prashanth Gurunath Shivakumar (18 papers)
  2. Jari Kolehmainen (13 papers)
  3. Yile Gu (25 papers)
  4. Ankur Gandhe (30 papers)
  5. Ariya Rastrow (55 papers)
  6. Ivan Bulyko (23 papers)
Citations (2)