Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distillation Strategies for Discriminative Speech Recognition Rescoring (2306.09452v1)

Published 15 Jun 2023 in eess.AS

Abstract: Second-pass rescoring is employed in most state-of-the-art speech recognition systems. Recently, BERT based models have gained popularity for re-ranking the n-best hypothesis by exploiting the knowledge from masked LLM pre-training. Further, fine-tuning with discriminative loss such as minimum word error rate (MWER) has shown to perform better than likelihood-based loss. Streaming applications with low latency requirements impose significant constraints on the size of the models, thereby limiting the word error rate (WER) performance gains. In this paper, we propose effective strategies for distilling from large models discriminatively trained with the MWER objective. We experiment on Librispeech and production scale internal dataset for voice-assistant. Our results demonstrate relative improvements of upto 7% WER over student models trained with MWER. We also show that the proposed distillation can reduce the WER gap between the student and the teacher by 62% upto 100%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Prashanth Gurunath Shivakumar (18 papers)
  2. Jari Kolehmainen (13 papers)
  3. Yile Gu (25 papers)
  4. Ankur Gandhe (30 papers)
  5. Ariya Rastrow (55 papers)
  6. Ivan Bulyko (23 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.