Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CAMP: a Two-Stage Approach to Modelling Prosody in Context (2011.01175v2)

Published 2 Nov 2020 in eess.AS

Abstract: Prosody is an integral part of communication, but remains an open problem in state-of-the-art speech synthesis. There are two major issues faced when modelling prosody: (1) prosody varies at a slower rate compared with other content in the acoustic signal (e.g. segmental information and background noise); (2) determining appropriate prosody without sufficient context is an ill-posed problem. In this paper, we propose solutions to both these issues. To mitigate the challenge of modelling a slow-varying signal, we learn to disentangle prosodic information using a word level representation. To alleviate the ill-posed nature of prosody modelling, we use syntactic and semantic information derived from text to learn a context-dependent prior over our prosodic space. Our Context-Aware Model of Prosody (CAMP) outperforms the state-of-the-art technique, closing the gap with natural speech by 26%. We also find that replacing attention with a jointly-trained duration model improves prosody significantly.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zack Hodari (6 papers)
  2. Alexis Moinet (22 papers)
  3. Sri Karlapati (13 papers)
  4. Jaime Lorenzo-Trueba (33 papers)
  5. Thomas Merritt (16 papers)
  6. Arnaud Joly (14 papers)
  7. Ammar Abbas (12 papers)
  8. Penny Karanasou (11 papers)
  9. Thomas Drugman (61 papers)
Citations (28)

Summary

We haven't generated a summary for this paper yet.