Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis (2308.16593v1)

Published 31 Aug 2023 in cs.SD, cs.CL, cs.LG, and eess.AS

Abstract: The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech. Moreover, a linguistic-aware encoder is used to model the relationship between each sentence in the conversation. Experimental results indicate that our proposed method achieves superior expressive speech synthesis performance with the ability to model spontaneous behavior in spontaneous-style speech and predict reasonable spontaneous behavior from text.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Weiqin Li (7 papers)
  2. Shun Lei (21 papers)
  3. Qiaochu Huang (7 papers)
  4. Yixuan Zhou (30 papers)
  5. Zhiyong Wu (171 papers)
  6. Shiyin Kang (27 papers)
  7. Helen Meng (204 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.