KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis (2404.01033v2)

Published 1 Apr 2024 in eess.AS

Abstract: This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications. KazEmoTTS is a collection of 54,760 audio-text pairs, with a total duration of 74.85 hours, featuring 34.23 hours delivered by a female narrator and 40.62 hours by two male narrators. The list of the emotions considered include "neutral", "angry", "happy", "sad", "scared", and "surprised". We also developed a TTS model trained on the KazEmoTTS dataset. Objective and subjective evaluations were employed to assess the quality of synthesized speech, yielding an MCD score within the range of 6.02 to 7.67, alongside a MOS that spanned from 3.51 to 3.57. To facilitate reproducibility and inspire further research, we have made our code, pre-trained model, and dataset accessible in our GitHub repository.

References (35)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - IS2AI/KazEmoTTS: An open-source Kazakh Emotional Text-to-Speech Dataset (30 stars)

Tweets

https://twitter.com/AudioAndSpeech/status/1775263992258363720

https://twitter.com/AudioAndSpeech/status/1778395626528497689

KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis (2404.01033v2)

Summary

Related Papers

GitHub

Tweets