Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech Recognition with Augmented Synthesized Speech (1909.11699v1)

Published 25 Sep 2019 in cs.CL, cs.SD, and eess.AS

Abstract: Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized speech has raised the exciting possibility of replacing expensive, manually transcribed, domain-specific, human speech that is used to train speech recognizers. The multi-speaker speech synthesis architecture can learn latent embedding spaces of prosody, speaker and style variations derived from input acoustic representations thereby allowing for manipulation of the synthesized speech. In this paper, we evaluate the feasibility of enhancing speech recognition performance using speech synthesis using two corpora from different domains. We explore algorithms to provide the necessary acoustic and lexical diversity needed for robust speech recognition. Finally, we demonstrate the feasibility of this approach as a data augmentation strategy for domain-transfer. We find that improvements to speech recognition performance is achievable by augmenting training data with synthesized material. However, there remains a substantial gap in performance between recognizers trained on human speech those trained on synthesized speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Andrew Rosenberg (32 papers)
  2. Yu Zhang (1400 papers)
  3. Bhuvana Ramabhadran (47 papers)
  4. Ye Jia (33 papers)
  5. Pedro Moreno (10 papers)
  6. Yonghui Wu (115 papers)
  7. Zelin Wu (12 papers)
Citations (123)

Summary

We haven't generated a summary for this paper yet.