Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

a novel cross-lingual voice cloning approach with a few text-free samples (1910.13276v2)

Published 29 Oct 2019 in eess.AS, cs.CL, and cs.SD

Abstract: In this paper, we present a cross-lingual voice cloning approach. BN features obtained by SI-ASR model are used as a bridge across speakers and language boundaries. The relationships between text and BN features are modeled by the latent prosody model. The acoustic model learns the translation from BN features to acoustic features. The acoustic model is fine-tuned with a few samples of the target speaker to realize voice cloning. This system can generate speech of arbitrary utterance of target language in cross-lingual speakers' voice. We verify that with small amount of audio data, our proposed approach can well handle cross-lingual tasks. And in intra-lingual tasks, our proposed approach also performs better than baseline approach in naturalness and similarity.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xinyong Zhou (3 papers)
  2. Hao Che (10 papers)
  3. Xiaorui Wang (30 papers)
  4. Lei Xie (339 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.