Papers
Topics
Authors
Recent
2000 character limit reached

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Published 1 Feb 2021 in cs.SD | (2102.00851v3)

Abstract: Generating natural speech with diverse and smooth prosody pattern is a challenging task. Although random sampling with phone-level prosody distribution has been investigated to generate different prosody patterns, the diversity of the generated speech is still very limited and far from what can be achieved by human. This is largely due to the use of uni-modal distribution, such as single Gaussian, in the prior works of phone-level prosody modelling. In this work, we propose a novel approach that models phone-level prosodies with GMM based mixture density network (GMM-MDN). Experiments on the LJSpeech dataset demonstrate that phone-level prosodies can precisely control the synthetic speech and GMM-MDN can generate more natural and smooth prosody pattern than a single Gaussian. Subjective evaluations further show that the proposed approach not only achieves better naturalness, but also significantly improves the prosody diversity in synthetic speech without the need of manual control.

Citations (16)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.