Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis (1804.02549v1)

Published 7 Apr 2018 in eess.AS, cs.CL, cs.SD, and stat.ML

Abstract: Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xin Wang (1307 papers)
  2. Jaime Lorenzo-Trueba (33 papers)
  3. Shinji Takaki (16 papers)
  4. Lauri Juvela (23 papers)
  5. Junichi Yamagishi (178 papers)
Citations (67)