Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Latent Representations for Speech Generation and Transformation (1704.04222v2)

Published 13 Apr 2017 in cs.CL, cs.LG, and stat.ML

Abstract: An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as Variational Autoencoders (VAEs) have achieved tremendous success in modeling natural images. In this paper, we apply a convolutional VAE to model the generative process of natural speech. We derive latent space arithmetic operations to disentangle learned latent representations. We demonstrate the capability of our model to modify the phonetic content or the speaker identity for speech segments using the derived operations, without the need for parallel supervisory data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Wei-Ning Hsu (76 papers)
  2. Yu Zhang (1400 papers)
  3. James Glass (173 papers)
Citations (144)

Summary

We haven't generated a summary for this paper yet.