Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing (2309.00916v2)

Published 2 Sep 2023 in cs.CL, cs.SD, and eess.AS

Abstract: The emergence of LLMs has sparked significant interest in extending their remarkable language capabilities to speech. However, modality alignment between speech and text still remains an open problem. Current solutions can be categorized into two strategies. One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are used as inputs for LLMs, which limits their potential in modeling alignment between speech and text. The other is an end-to-end approach that relies on speech instruction data, which is very difficult to collect in large quantities. In this paper, we address these issues and propose the BLSP approach that Bootstraps Language-Speech Pre-training via behavior alignment of continuation writing. We achieve this by learning a lightweight modality adapter between a frozen speech encoder and an LLM, ensuring that the LLM exhibits the same generation behavior regardless of the modality of input: a speech segment or its transcript. The training process can be divided into two steps. The first step prompts an LLM to generate texts with speech transcripts as prefixes, obtaining text continuations. In the second step, these continuations are used as supervised signals to train the modality adapter in an end-to-end manner. We demonstrate that this straightforward process can extend the capabilities of LLMs to speech, enabling speech recognition, speech translation, spoken language understanding, and speech conversation, even in zero-shot cross-lingual scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chen Wang (599 papers)
  2. Minpeng Liao (11 papers)
  3. Zhongqiang Huang (20 papers)
  4. Jinliang Lu (8 papers)
  5. Junhong Wu (10 papers)
  6. Yuchen Liu (156 papers)
  7. Chengqing Zong (65 papers)
  8. Jiajun Zhang (176 papers)
Citations (23)