Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech (2306.14145v1)

Published 25 Jun 2023 in cs.SD, cs.CL, and eess.AS

Abstract: Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i.e. speaker similarity) and eliminate the accents from their first language(i.e. nativeness). In this paper, we demonstrated that vector-quantized(VQ) acoustic feature contains less speaker information than mel-spectrogram. Based on this finding, we propose a novel dual speaker embedding TTS (DSE-TTS) framework for CTTS with authentic speaking style. Here, one embedding is fed to the acoustic model to learn the linguistic speaking style, while the other one is integrated into the vocoder to mimic the target speaker's timbre. Experiments show that by combining both embeddings, DSE-TTS significantly outperforms the state-of-the-art SANE-TTS in cross-lingual synthesis, especially in terms of nativeness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sen Liu (35 papers)
  2. Yiwei Guo (30 papers)
  3. Chenpeng Du (28 papers)
  4. Xie Chen (166 papers)
  5. Kai Yu (202 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.