Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech (2406.08076v1)

Published 12 Jun 2024 in eess.AS and cs.SD

Abstract: Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transferring them to a target language using cross-lingual TTS techniques. While previous approaches have mainly concentrated on controlling voice identity within the cross-lingual TTS framework, there has been limited work on incorporating emotion and voice identity together. To this end, we introduce an end-to-end Voice Identity and Emotional Style Controllable Cross-Lingual (VECL) TTS system using multilingual speakers and an emotion embedding network. Moreover, we introduce content and style consistency losses to enhance the quality of synthesized speech further. The proposed system achieved an average relative improvement of 8.83\% compared to the state-of-the-art (SOTA) methods on a database comprising English and three Indian languages (Hindi, Telugu, and Marathi).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ashishkumar Gudmalwar (3 papers)
  2. Nirmesh Shah (5 papers)
  3. Sai Akarsh (2 papers)
  4. Pankaj Wasnik (22 papers)
  5. Rajiv Ratn Shah (108 papers)
Citations (1)