Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation (2409.04016v1)

Published 6 Sep 2024 in cs.SD and eess.AS

Abstract: Neural audio codec tokens serve as the fundamental building blocks for speech LLM (SLM)-based speech generation. However, there is no systematic understanding on how the codec system affects the speech generation performance of the SLM. In this work, we examine codec tokens within SLM framework for speech generation to provide insights for effective codec design. We retrain existing high-performing neural codec models on the same data set and loss functions to compare their performance in a uniform setting. We integrate codec tokens into two SLM systems: masked-based parallel speech generation system and an auto-regressive (AR) plus non-auto-regressive (NAR) model-based system. Our findings indicate that better speech reconstruction in codec systems does not guarantee improved speech generation in SLM. A high-quality codec decoder is crucial for natural speech production in SLM, while speech intelligibility depends more on quantization mechanism.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (16)
  1. Jiaqi Li (142 papers)
  2. Dongmei Wang (16 papers)
  3. Xiaofei Wang (138 papers)
  4. Yao Qian (37 papers)
  5. Long Zhou (57 papers)
  6. Shujie Liu (101 papers)
  7. Midia Yousefi (10 papers)
  8. Canrun Li (5 papers)
  9. Chung-Hsien Tsai (5 papers)
  10. Zhen Xiao (24 papers)
  11. Yanqing Liu (48 papers)
  12. Junkun Chen (27 papers)
  13. Sheng Zhao (75 papers)
  14. Jinyu Li (164 papers)
  15. Zhizheng Wu (45 papers)
  16. Michael Zeng (76 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets