Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On The Landscape of Spoken Language Models: A Comprehensive Survey (2504.08528v1)

Published 11 Apr 2025 in cs.CL, cs.SD, and eess.AS

Abstract: The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken LLMs (SLMs) which act as universal speech processing systems. This trend is similar to the progression toward universal LLMs that has taken place in the field of (text) natural language processing. SLMs include both "pure" LLMs of speech -- models of the distribution of tokenized speech sequences -- and models that combine speech encoders with text LLMs, often including both spoken and written input or output. Work in this area is very diverse, with a range of terminology and evaluation settings. This paper aims to contribute an improved understanding of SLMs via a unifying literature survey of recent work in the context of the evolution of the field. Our survey categorizes the work in this area by model architecture, training, and evaluation choices, and describes some key challenges and directions for future work.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Siddhant Arora (50 papers)
  2. Kai-Wei Chang (292 papers)
  3. Chung-Ming Chien (13 papers)
  4. Yifan Peng (147 papers)
  5. Haibin Wu (85 papers)
  6. Yossi Adi (96 papers)
  7. Emmanuel Dupoux (81 papers)
  8. Hung-yi Lee (327 papers)
  9. Karen Livescu (89 papers)
  10. Shinji Watanabe (416 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com