Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Combining speakers of multiple languages to improve quality of neural voices (2108.07737v1)

Published 17 Aug 2021 in cs.CL, cs.SD, and eess.AS

Abstract: In this work, we explore multiple architectures and training procedures for developing a multi-speaker and multi-lingual neural TTS system with the goals of a) improving the quality when the available data in the target language is limited and b) enabling cross-lingual synthesis. We report results from a large experiment using 30 speakers in 8 different languages across 15 different locales. The system is trained on the same amount of data per speaker. Compared to a single-speaker model, when the suggested system is fine tuned to a speaker, it produces significantly better quality in most of the cases while it only uses less than $40\%$ of the speaker's data used to build the single-speaker model. In cross-lingual synthesis, on average, the generated quality is within $80\%$ of native single-speaker models, in terms of Mean Opinion Score.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Javier Latorre (6 papers)
  2. Charlotte Bailleul (1 paper)
  3. Tuuli Morrill (2 papers)
  4. Alistair Conkie (4 papers)
  5. Yannis Stylianou (15 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.