Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset (2301.00657v1)

Published 11 Dec 2022 in eess.AS, cs.AI, and cs.CL

Abstract: Text-to-Speech (TTS) synthesis for low-resource languages is an attractive research issue in academia and industry nowadays. Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide. However, there is a relative lack of open-source datasets for Mongolian TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers. In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 hours of speeches in Mongolian, resulting 30 hours in total. Furthermore, we build the baseline system based on the state-of-the-art FastSpeech2 model and HiFi-GAN vocoder. The experimental results suggest that the constructed MnTTS2 dataset is sufficient to build robust multi-speaker TTS models for real-world applications. The MnTTS2 dataset, training recipe, and pretrained models are released at: \url{https://github.com/ssmlkl/MnTTS2}

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kailin Liang (1 paper)
  2. Bin Liu (441 papers)
  3. Yifan Hu (89 papers)
  4. Rui Liu (320 papers)
  5. Feilong Bao (11 papers)
  6. Guanglai Gao (29 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.