Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Environment Aware Text-to-Speech Synthesis (2110.03887v4)

Published 8 Oct 2021 in eess.AS and cs.SD

Abstract: This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in TTS system development. The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis. Two embedding extractors are trained with two purposely constructed datasets for characterization and disentanglement of speaker and environment factors in speech. A neural network model is trained to generate speech from extracted speaker and environment embeddings. Objective and subjective evaluation results demonstrate that the proposed TTS system is able to effectively disentangle speaker and environment factors and synthesize speech audio that carries designated speaker characteristics and environment attribute. Audio samples are available online for demonstration https://daxintan-cuhk.github.io/Environment-Aware-TTS/ .

Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub