Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low Bit-Rate Speech Coding with VQ-VAE and a WaveNet Decoder (1910.06464v1)

Published 14 Oct 2019 in cs.LG, cs.SD, eess.AS, and stat.ML

Abstract: In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yazhe Li (17 papers)
  2. Felicia S C Lim (7 papers)
  3. Alejandro Luebs (6 papers)
  4. Oriol Vinyals (116 papers)
  5. Thomas C Walters (3 papers)
  6. Cristina Gârbacea (2 papers)
  7. Aäron van den Oord (14 papers)
Citations (108)

Summary

We haven't generated a summary for this paper yet.