Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech Synthesis as Augmentation for Low-Resource ASR (2012.13004v1)

Published 23 Dec 2020 in cs.CL, cs.SD, and eess.AS

Abstract: Speech synthesis might hold the key to low-resource speech recognition. Data augmentation techniques have become an essential part of modern speech recognition training. Yet, they are simple, naive, and rarely reflect real-world conditions. Meanwhile, speech synthesis techniques have been rapidly getting closer to the goal of achieving human-like speech. In this paper, we investigate the possibility of using synthesized speech as a form of data augmentation to lower the resources necessary to build a speech recognizer. We experiment with three different kinds of synthesizers: statistical parametric, neural, and adversarial. Our findings are interesting and point to new research directions for the future.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Deblin Bagchi (6 papers)
  2. Shannon Wotherspoon (4 papers)
  3. Zhuolin Jiang (12 papers)
  4. Prasanna Muthukumar (2 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.