Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Voice conversion using coefficient mapping and neural network (2003.05184v1)

Published 11 Mar 2020 in eess.AS

Abstract: The research presents a voice conversion model using coefficient mapping and neural network. Most previous works on parametric speech synthesis did not account for losses in spectral details causing over smoothing and invariably, an appreciable deviation of the converted speech from the targeted speaker. An improved model that uses both linear predictive coding (LPC) and line spectral frequency (LSF) coefficients to parametrize the source speech signal was developed in this work to reveal the effect of over-smoothing. Non-linear mapping ability of neural network was employed in mapping the source speech vectors into the acoustic vector space of the target. Training LPC coefficients with neural network yielded a poor result due to the instability of the LPC filter poles. The LPC coefficients were converted to line spectral frequency coefficients before been trained with a 3-layer neural network. The algorithm was tested with noisy data with the result evaluated using Mel-Cepstral Distance measurement. Cepstral distance evaluation shows a 35.7 percent reduction in the spectral distance between the target and the converted speech.

Citations (8)

Summary

We haven't generated a summary for this paper yet.