Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech (2207.06088v1)

Published 13 Jul 2022 in cs.SD and eess.AS

Abstract: Some recent studies have demonstrated the feasibility of single-stage neural text-to-speech, which does not need to generate mel-spectrograms but generates the raw waveforms directly from the text. Single-stage text-to-speech often faces two problems: a) the one-to-many mapping problem due to multiple speech variations and b) insufficiency of high frequency reconstruction due to the lack of supervision of ground-truth acoustic features during training. To solve the a) problem and generate more expressive speech, we propose a novel phoneme-level prosody modeling method based on a variational autoencoder with normalizing flows to model underlying prosodic information in speech. We also use the prosody predictor to support end-to-end expressive speech synthesis. Furthermore, we propose the dual parallel autoencoder to introduce supervision of the ground-truth acoustic features during training to solve the b) problem enabling our model to generate high-quality speech. We compare the synthesis quality with state-of-the-art text-to-speech systems on an internal expressive English dataset. Both qualitative and quantitative evaluations demonstrate the superiority and robustness of our method for lossless speech generation while also showing a strong capability in prosody modeling.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhengxi Liu (4 papers)
  2. Qiao Tian (27 papers)
  3. Chenxu Hu (12 papers)
  4. Xudong Liu (41 papers)
  5. Menglin Wu (3 papers)
  6. Yuping Wang (56 papers)
  7. Hang Zhao (156 papers)
  8. Yuxuan Wang (239 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.