Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling (2106.06500v2)

Published 11 Jun 2021 in cs.SD and eess.AS

Abstract: The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called Dynamical Variational Autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiaoyu Bie (10 papers)
  2. Laurent Girin (40 papers)
  3. Simon Leglaive (24 papers)
  4. Thomas Hueber (14 papers)
  5. Xavier Alameda-Pineda (69 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.