Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels
Abstract: This paper presents a end-to-end framework for the F0 transformation in the context of expressive voice conversion. A single neural network is proposed, in which a first module is used to learn F0 representation over different temporal scales and a second adversarial module is used to learn the transformation from one emotion to another. The first module is composed of a convolution layer with wavelet kernels so that the various temporal scales of F0 variations can be efficiently encoded. The single decomposition/transformation network allows to learn in a end-to-end manner the F0 decomposition that are optimal with respect to the transformation, directly from the raw F0 signal.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.