Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction (2506.12537v1)

Published 14 Jun 2025 in cs.CL, cs.AI, and eess.AS

Abstract: Speech-LLMs (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the impact of key components (i.e., speech tokenizers, speech heads, and speaker modeling) on the performance of LLM-centric SLMs. We compare coupled, semi-decoupled, and fully decoupled speech tokenizers under a fair SLM framework and find that decoupled tokenization significantly improves alignment and synthesis quality. To address the information density mismatch between speech and text, we introduce multi-token prediction (MTP) into SLMs, enabling each hidden state to decode multiple speech tokens. This leads to up to 12$\times$ faster decoding and a substantial drop in word error rate (from 6.07 to 3.01). Furthermore, we propose a speaker-aware generation paradigm and introduce RoleTriviaQA, a large-scale role-playing knowledge QA benchmark with diverse speaker identities. Experiments demonstrate that our methods enhance both knowledge understanding and speaker consistency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (24)
  1. Xiaoran Fan (23 papers)
  2. Zhichao Sun (12 papers)
  3. Yangfan Gao (1 paper)
  4. Jingfei Xiong (1 paper)
  5. Hang Yan (86 papers)
  6. Yifei Cao (4 papers)
  7. Jiajun Sun (17 papers)
  8. Shuo Li (179 papers)
  9. Zhihao Zhang (61 papers)
  10. Zhiheng Xi (37 papers)
  11. Yuhao Zhou (78 papers)
  12. Senjie Jin (10 papers)
  13. Changhao Jiang (7 papers)
  14. Junjie Ye (66 papers)
  15. Ming Zhang (313 papers)
  16. Rui Zheng (79 papers)
  17. Zhenhua Han (18 papers)
  18. Yunke Zhang (18 papers)
  19. Demei Yan (1 paper)
  20. Shaokang Dong (3 papers)