Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conditional variational autoencoder to improve neural audio synthesis for polyphonic music sound (2211.08715v1)

Published 16 Nov 2022 in cs.SD, cs.LG, and eess.AS

Abstract: Deep generative models for audio synthesis have recently been significantly improved. However, the task of modeling raw-waveforms remains a difficult problem, especially for audio waveforms and music signals. Recently, the realtime audio variational autoencoder (RAVE) method was developed for high-quality audio waveform synthesis. The RAVE method is based on the variational autoencoder and utilizes the two-stage training strategy. Unfortunately, the RAVE model is limited in reproducing wide-pitch polyphonic music sound. Therefore, to enhance the reconstruction performance, we adopt the pitch activation data as an auxiliary information to the RAVE model. To handle the auxiliary information, we propose an enhanced RAVE model with a conditional variational autoencoder structure and an additional fully-connected layer. To evaluate the proposed structure, we conducted a listening experiment based on multiple stimulus tests with hidden references and an anchor (MUSHRA) with the MAESTRO. The obtained results indicate that the proposed model exhibits a more significant performance and stability improvement than the conventional RAVE model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Seokjin Lee (2 papers)
  2. Minhan Kim (1 paper)
  3. Seunghyeon Shin (2 papers)
  4. Daeho Lee (9 papers)
  5. Inseon Jang (7 papers)
  6. Wootaek Lim (3 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.