Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SpecAugment on Large Scale Datasets (1912.05533v1)

Published 11 Dec 2019 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Narayanan et al., 2018). We achieve improvement across all test domains by mixing raw training data augmented with SpecAugment and noise-perturbed training data when training the acoustic model. We also introduce a modification of SpecAugment that adapts the time mask size and/or multiplicity depending on the length of the utterance, which can potentially benefit large scale tasks. By using adaptive masking, we are able to further improve the performance of the Listen, Attend and Spell model on LibriSpeech to 2.2% WER on test-clean and 5.2% WER on test-other.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Daniel S. Park (30 papers)
  2. Yu Zhang (1399 papers)
  3. Chung-Cheng Chiu (48 papers)
  4. Youzheng Chen (5 papers)
  5. Bo Li (1107 papers)
  6. William Chan (54 papers)
  7. Quoc V. Le (128 papers)
  8. Yonghui Wu (115 papers)
Citations (130)