Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data (2008.03992v3)

Published 10 Aug 2020 in eess.AS, cs.CL, and cs.SD

Abstract: Singing voice conversion aims to convert singer's voice from source to target without changing singing content. Parallel training data is typically required for the training of singing voice conversion system, that is however not practical in real-life applications. Recent encoder-decoder structures, such as variational autoencoding Wasserstein generative adversarial network (VAW-GAN), provide an effective way to learn a mapping through non-parallel training data. In this paper, we propose a singing voice conversion framework that is based on VAW-GAN. We train an encoder to disentangle singer identity and singing prosody (F0 contour) from phonetic content. By conditioning on singer identity and F0, the decoder generates output spectral features with unseen target singer identity, and improves the F0 rendering. Experimental results show that the proposed framework achieves better performance than the baseline frameworks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Junchen Lu (7 papers)
  2. Kun Zhou (217 papers)
  3. Haizhou Li (286 papers)
  4. Berrak Sisman (49 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.