Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder (2008.06867v1)

Published 16 Aug 2020 in eess.AS, cs.CL, and cs.SD

Abstract: In recent works, a flow-based neural vocoder has shown significant improvement in real-time speech generation task. The sequence of invertible flow operations allows the model to convert samples from simple distribution to audio samples. However, training a continuous density model on discrete audio data can degrade model performance due to the topological difference between latent and actual distribution. To resolve this problem, we propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation. Data dequantization is a well-known method in image generation but has not yet been studied in the audio domain. For this reason, we implement various audio dequantization methods in flow-based neural vocoder and investigate the effect on the generated audio. We conduct various objective performance assessments and subjective evaluation to show that audio dequantization can improve audio generation quality. From our experiments, using audio dequantization produces waveform audio with better harmonic structure and fewer digital artifacts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hyun-Wook Yoon (7 papers)
  2. Sang-Hoon Lee (24 papers)
  3. Hyeong-Rae Noh (2 papers)
  4. Seong-Whan Lee (132 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.