Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Parallel WaveNet conditioned on VAE latent vectors (2012.09703v1)

Published 17 Dec 2020 in eess.AS and cs.SD

Abstract: Recently the state-of-the-art text-to-speech synthesis systems have shifted to a two-model approach: a sequence-to-sequence model to predict a representation of speech (typically mel-spectrograms), followed by a 'neural vocoder' model which produces the time-domain speech waveform from this intermediate speech representation. This approach is capable of synthesizing speech that is confusable with natural speech recordings. However, the inference speed of neural vocoder approaches represents a major obstacle for deploying this technology for commercial applications. Parallel WaveNet is one approach which has been developed to address this issue, trading off some synthesis quality for significantly faster inference speed. In this paper we investigate the use of a sentence-level conditioning vector to improve the signal quality of a Parallel WaveNet neural vocoder. We condition the neural vocoder with the latent vector from a pre-trained VAE component of a Tacotron 2-style sequence-to-sequence model. With this, we are able to significantly improve the quality of vocoded speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jonas Rohnke (5 papers)
  2. Tom Merritt (1 paper)
  3. Jaime Lorenzo-Trueba (33 papers)
  4. Vatsal Aggarwal (5 papers)
  5. Alexis Moinet (22 papers)
  6. Roberto Barra-Chicote (24 papers)
  7. Adam Gabrys (8 papers)
Citations (3)