Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation (2210.15398v2)

Published 27 Oct 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Data augmentation is a technique to generate new training data based on existing data. We evaluate the simple and cost-effective method of concatenating the original data examples to build new training instances. Continued training with such augmented data is able to improve off-the-shelf Transformer and Conformer models that were optimized on the original data only. We demonstrate considerable improvements on the LibriSpeech-960h test sets (WER 2.83 and 6.87 for test-clean and test-other), which carry over to models combined with shallow fusion (WER 2.55 and 6.27). Our method of continued training also leads to improvements of up to 0.9 WER on the ASR part of CoVoST-2 for four non English languages, and we observe that the gains are highly dependent on the size of the original training data. We compare different concatenation strategies and found that our method does not need speaker information to achieve its improvements. Finally, we demonstrate on two datasets that our methods also works for speech translation tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Tsz Kin Lam (13 papers)
  2. Shigehiko Schamoni (10 papers)
  3. Stefan Riezler (44 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.