Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis (2307.07218v4)

Published 14 Jul 2023 in eess.AS and cs.SD

Abstract: Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping the fine-tuning process. However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which significantly restricts their performance when the data is relatively sufficient during the inference stage. 2) The prosodic information in prompts is highly coupled with timbre, making it untransferable to each other. This paper introduces Mega-TTS 2, a generic prompting mechanism for zero-shot TTS, to tackle the aforementioned challenges. Specifically, we design a powerful acoustic autoencoder that separately encodes the prosody and timbre information into the compressed latent space while providing high-quality reconstructions. Then, we propose a multi-reference timbre encoder and a prosody latent LLM (P-LLM) to extract useful information from multi-sentence prompts. We further leverage the probabilities derived from multiple P-LLM outputs to produce transferable and controllable prosody. Experimental results demonstrate that Mega-TTS 2 could not only synthesize identity-preserving speech with a short prompt of an unseen speaker from arbitrary sources but consistently outperform the fine-tuning method when the volume of data ranges from 10 seconds to 5 minutes. Furthermore, our method enables to transfer various speaking styles to the target timbre in a fine-grained and controlled manner. Audio samples can be found in https://boostprompt.github.io/boostprompt/.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Ziyue Jiang (38 papers)
  2. Jinglin Liu (38 papers)
  3. Yi Ren (215 papers)
  4. Chen Zhang (403 papers)
  5. Zhenhui Ye (25 papers)
  6. Pengfei Wei (21 papers)
  7. Chunfeng Wang (6 papers)
  8. Xiang Yin (99 papers)
  9. Zejun Ma (78 papers)
  10. Zhou Zhao (218 papers)
  11. Shengpeng Ji (26 papers)
  12. Qian Yang (146 papers)
  13. JinZheng He (22 papers)
Citations (28)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com