Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semi-Supervised Generative Modeling for Controllable Speech Synthesis (1910.01709v1)

Published 3 Oct 2019 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: We present a novel generative model that combines state-of-the-art neural text-to-speech (TTS) with semi-supervised probabilistic latent variable models. By providing partial supervision to some of the latent variables, we are able to force them to take on consistent and interpretable purposes, which previously hasn't been possible with purely unsupervised TTS models. We demonstrate that our model is able to reliably discover and control important but rarely labelled attributes of speech, such as affect and speaking rate, with as little as 1% (30 minutes) supervision. Even at such low supervision levels we do not observe a degradation of synthesis quality compared to a state-of-the-art baseline. Audio samples are available on the web.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Raza Habib (5 papers)
  2. Soroosh Mariooryad (11 papers)
  3. Matt Shannon (10 papers)
  4. Eric Battenberg (14 papers)
  5. RJ Skerry-Ryan (21 papers)
  6. Daisy Stanton (12 papers)
  7. David Kao (10 papers)
  8. Tom Bagby (9 papers)
Citations (47)

Summary

We haven't generated a summary for this paper yet.