Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning (1904.06508v2)

Published 13 Apr 2019 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text plus speech data. However, laborious data collection remains difficult for at least 95% of the languages over the world, which hinders the development of TTS in different languages. In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available. We show such TTS can be effectively constructed by transferring knowledge from a high-resource (source) language. Since the model trained on source language cannot be directly applied to target language due to input space mismatch, we propose a method to learn a mapping between source and target linguistic symbols. Benefiting from this learned mapping, pronunciation information can be preserved throughout the transferring procedure. Preliminary experiments show that we only need around 15 minutes of paired data to obtain a relatively good TTS system. Furthermore, analytic studies demonstrated that the automatically discovered mapping correlate well with the phonetic expertise.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tao Tu (45 papers)
  2. Yuan-Jui Chen (3 papers)
  3. Cheng-chieh Yeh (4 papers)
  4. Hung-yi Lee (327 papers)
Citations (82)

Summary

We haven't generated a summary for this paper yet.