Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions (2005.03271v3)

Published 7 May 2020 in eess.AS and cs.CL

Abstract: In recent years, all-neural end-to-end approaches have obtained state-of-the-art results on several challenging automatic speech recognition (ASR) tasks. However, most existing works focus on building ASR models where train and test data are drawn from the same domain. This results in poor generalization characteristics on mismatched-domains: e.g., end-to-end models trained on short segments perform poorly when evaluated on longer utterances. In this work, we analyze the generalization properties of streaming and non-streaming recurrent neural network transducer (RNN-T) based end-to-end models in order to identify model components that negatively affect generalization performance. We propose two solutions: combining multiple regularization techniques during training, and using dynamic overlapping inference. On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22.3% to 14.8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67.0% to 25.3%. Finally, when trained on Librispeech, we find that dynamic overlapping inference improves WER on YouTube from 99.8% to 33.0%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Chung-Cheng Chiu (48 papers)
  2. Arun Narayanan (34 papers)
  3. Wei Han (202 papers)
  4. Rohit Prabhavalkar (59 papers)
  5. Yu Zhang (1399 papers)
  6. Navdeep Jaitly (67 papers)
  7. Ruoming Pang (59 papers)
  8. Tara N. Sainath (79 papers)
  9. Patrick Nguyen (15 papers)
  10. Liangliang Cao (52 papers)
  11. Yonghui Wu (115 papers)
Citations (41)
X Twitter Logo Streamline Icon: https://streamlinehq.com