Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models (2407.13509v1)

Published 18 Jul 2024 in cs.SD, cs.CL, cs.LG, and eess.AS

Abstract: Spontaneous style speech synthesis, which aims to generate human-like speech, often encounters challenges due to the scarcity of high-quality data and limitations in model capabilities. Recent LLM-based TTS systems can be trained on large, diverse, and low-quality speech datasets, resulting in highly natural synthesized speech. However, they are limited by the difficulty of simulating various spontaneous behaviors and capturing prosody variations in spontaneous speech. In this paper, we propose a novel spontaneous speech synthesis system based on LLMs. We systematically categorize and uniformly model diverse spontaneous behaviors. Moreover, fine-grained prosody modeling is introduced to enhance the model's ability to capture subtle prosody variations in spontaneous speech.Experimental results show that our proposed method significantly outperforms the baseline methods in terms of prosody naturalness and spontaneous behavior naturalness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Weiqin Li (7 papers)
  2. Peiji Yang (5 papers)
  3. Yicheng Zhong (3 papers)
  4. Yixuan Zhou (30 papers)
  5. Zhisheng Wang (15 papers)
  6. Zhiyong Wu (171 papers)
  7. Xixin Wu (85 papers)
  8. Helen Meng (204 papers)
Citations (3)