Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning (2102.05630v1)

Published 10 Feb 2021 in cs.SD, cs.LG, and eess.AS

Abstract: Deep learning models are becoming predominant in many fields of machine learning. Text-to-Speech (TTS), the process of synthesizing artificial speech from text, is no exception. To this end, a deep neural network is usually trained using a corpus of several hours of recorded speech from a single speaker. Trying to produce the voice of a speaker other than the one learned is expensive and requires large effort since it is necessary to record a new dataset and retrain the model. This is the main reason why the TTS models are usually single speaker. The proposed approach has the goal to overcome these limitations trying to obtain a system which is able to model a multi-speaker acoustic space. This allows the generation of speech audio similar to the voice of different target speakers, even if they were not observed during the training phase.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Giuseppe Ruggiero (6 papers)
  2. Enrico Zovato (7 papers)
  3. Luigi Di Caro (8 papers)
  4. Vincent Pollet (4 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com