Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network (2001.11686v1)

Published 31 Jan 2020 in eess.AS

Abstract: In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN). The recently proposed LPCNet vocoder has successfully achieved high-quality and lightweight speech synthesis systems by combining a vocal tract LP filter with a WaveRNN-based vocal source (i.e., excitation) generator. However, the quality of synthesized speech is often unstable because the vocal source component is insufficiently represented by the mu-law quantization method, and the model is trained without considering the entire speech production mechanism. To address this problem, we first introduce LP-MDN, which enables the autoregressive neural vocoder to structurally represent the interactions between the vocal tract and vocal source components. Then, we propose to incorporate the LP-MDN to the LPCNet vocoder by replacing the conventional discretized output with continuous density distribution. The experimental results verify that the proposed system provides high quality synthetic speech by achieving a mean opinion score of 4.41 within a text-to-speech framework.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Min-Jae Hwang (13 papers)
  2. Eunwoo Song (19 papers)
  3. Ryuichi Yamamoto (34 papers)
  4. Frank Soong (9 papers)
  5. Hong-Goo Kang (36 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.