CAMP: a Two-Stage Approach to Modelling Prosody in Context (2011.01175v2)

Published 2 Nov 2020 in eess.AS

Abstract: Prosody is an integral part of communication, but remains an open problem in state-of-the-art speech synthesis. There are two major issues faced when modelling prosody: (1) prosody varies at a slower rate compared with other content in the acoustic signal (e.g. segmental information and background noise); (2) determining appropriate prosody without sufficient context is an ill-posed problem. In this paper, we propose solutions to both these issues. To mitigate the challenge of modelling a slow-varying signal, we learn to disentangle prosodic information using a word level representation. To alleviate the ill-posed nature of prosody modelling, we use syntactic and semantic information derived from text to learn a context-dependent prior over our prosodic space. Our Context-Aware Model of Prosody (CAMP) outperforms the state-of-the-art technique, closing the gap with natural speech by 26%. We also find that replacing attention with a jointly-trained duration model improves prosody significantly.

Authors (9)

Zack Hodari (6 papers)
Alexis Moinet (22 papers)
Sri Karlapati (13 papers)
Jaime Lorenzo-Trueba (33 papers)
Thomas Merritt (16 papers)
Arnaud Joly (14 papers)
Ammar Abbas (12 papers)
Penny Karanasou (11 papers)
Thomas Drugman (61 papers)

Citations (28)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

CAMP: a Two-Stage Approach to Modelling Prosody in Context (2011.01175v2)

Summary

Related Papers