Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Toward Joint Language Modeling for Speech Units and Text (2310.08715v1)

Published 12 Oct 2023 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of LLMing, very little effort has been made to model them jointly. In light of this, we explore joint LLMing for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Ju-Chieh Chou (9 papers)
  2. Chung-Ming Chien (13 papers)
  3. Wei-Ning Hsu (76 papers)
  4. Karen Livescu (89 papers)
  5. Arun Babu (14 papers)
  6. Alexis Conneau (33 papers)
  7. Alexei Baevski (39 papers)
  8. Michael Auli (73 papers)
Citations (16)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com