Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Noisy Training Improves E2E ASR for the Edge (2107.04677v1)

Published 9 Jul 2021 in cs.CL

Abstract: Automatic speech recognition (ASR) has become increasingly ubiquitous on modern edge devices. Past work developed streaming End-to-End (E2E) all-neural speech recognizers that can run compactly on edge devices. However, E2E ASR models are prone to overfitting and have difficulties in generalizing to unseen testing data. Various techniques have been proposed to regularize the training of ASR models, including layer normalization, dropout, spectrum data augmentation and speed distortions in the inputs. In this work, we present a simple yet effective noisy training strategy to further improve the E2E ASR model training. By introducing random noise to the parameter space during training, our method can produce smoother models at convergence that generalize better. We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction. Specifically, when training Emformers with 90% sparsity, we achieve 12% and 14% WER improvements on the LibriSpeech Test-other and Test-clean data set, respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Dilin Wang (37 papers)
  2. Yuan Shangguan (25 papers)
  3. Haichuan Yang (21 papers)
  4. Pierce Chuang (8 papers)
  5. Jiatong Zhou (3 papers)
  6. Meng Li (244 papers)
  7. Ganesh Venkatesh (14 papers)
  8. Ozlem Kalinli (49 papers)
  9. Vikas Chandra (74 papers)
Citations (4)