Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation (2406.07422v1)

Published 11 Jun 2024 in eess.AS

Abstract: The multi-codebook speech codec enables the application of LLMs (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermore, the encoder is enhanced with 1) contextual modeling with a BLSTM module to exploit the temporal information, 2) a hybrid sampling module to alleviate distortion from upsampling and downsampling, and 3) a resampling module to encourage discrete units to carry more phonetic information. Compared with multi-codebook codecs, e.g., EnCodec and TiCodec, Single-Codec demonstrates higher reconstruction quality with a lower bandwidth of only 304bps. The effectiveness of Single-Code is further validated by LLM-TTS experiments, showing improved naturalness and intelligibility.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Hanzhao Li (10 papers)
  2. Liumeng Xue (24 papers)
  3. Haohan Guo (22 papers)
  4. Xinfa Zhu (29 papers)
  5. Yuanjun Lv (12 papers)
  6. Lei Xie (337 papers)
  7. Yunlin Chen (7 papers)
  8. Hao Yin (66 papers)
  9. Zhifei Li (19 papers)
Citations (17)