Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation (2207.14607v1)

Published 29 Jul 2022 in eess.AS and cs.SD

Abstract: The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming. To overcome these issues, we demonstrate how to build low-resource, neural text-to-speech (TTS) voices with only 1 hour of conversational speech, when no other conversational data are available in the same language. Assuming the availability of non-expressive speech data in that language, we propose a 3-step technology: 1) we train an F0-conditioned voice conversion (VC) model as data augmentation technique; 2) we train an F0 predictor to control the conversational flavour of the voice-converted synthetic data; 3) we train a TTS system that consumes the augmented data. We prove that our technology enables F0 controllability, is scalable across speakers and languages and is competitive in terms of naturalness over a state-of-the-art baseline model, another augmented method which does not make use of F0 information.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Giulia Comini (7 papers)
  2. Goeric Huybrechts (15 papers)
  3. Manuel Sam Ribeiro (15 papers)
  4. Jaime Lorenzo-Trueba (33 papers)
  5. Adam Gabrys (8 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.