Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data (2306.16083v1)

Published 28 Jun 2023 in cs.SD and eess.AS

Abstract: We propose UnitSpeech, a speaker-adaptive speech synthesis method that fine-tunes a diffusion-based text-to-speech (TTS) model using minimal untranscribed data. To achieve this, we use the self-supervised unit representation as a pseudo transcript and integrate the unit encoder into the pre-trained TTS model. We train the unit encoder to provide speech content to the diffusion-based decoder and then fine-tune the decoder for speaker adaptation to the reference speaker using a single $<$unit, speech$>$ pair. UnitSpeech performs speech synthesis tasks such as TTS and voice conversion (VC) in a personalized manner without requiring model re-training for each task. UnitSpeech achieves comparable and superior results on personalized TTS and any-to-any VC tasks compared to previous baselines. Our model also shows widespread adaptive performance on real-world data and other tasks that use a unit sequence as input.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Heeseung Kim (16 papers)
  2. Sungwon Kim (32 papers)
  3. Jiheum Yeom (6 papers)
  4. Sungroh Yoon (163 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.