Voice command generation using Progressive Wavegans (1903.07395v1)

Published 13 Mar 2019 in cs.CL, cs.LG, cs.SD, eess.AS, and stat.ML

Abstract: Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the original WaveGAN approach.

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Voice command generation using Progressive Wavegans (1903.07395v1)

Summary

Related Papers