Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion (2110.10326v2)

Published 20 Oct 2021 in eess.AS and cs.SD

Abstract: Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the emotional style for different speakers. Inspired by the recent success of speaker disentanglement with variational autoencoder (VAE), we propose an any-to-any expressive voice conversion framework, that is called StyleVC. StyleVC is designed to disentangle linguistic content, speaker identity, pitch, and emotional style information. We study the use of style encoder to model emotional style explicitly. At run-time, StyleVC converts both speaker identity and emotional style for arbitrary speakers. Experiments validate the effectiveness of our proposed framework in both objective and subjective evaluations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zongyang Du (7 papers)
  2. Berrak Sisman (49 papers)
  3. Kun Zhou (217 papers)
  4. Haizhou Li (286 papers)
Citations (23)

Summary

We haven't generated a summary for this paper yet.