Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR (2211.03025v1)

Published 6 Nov 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. Our experiments show that unsupervised ASR itself can improve the representations from speech self-supervised models. More importantly, it is shown as an efficient connector between speech and textual pre-trained models, improving the performances of five different SLU tasks. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jiatong Shi (82 papers)
  2. Chan-Jan Hsu (16 papers)
  3. Dongji Gao (8 papers)
  4. Shinji Watanabe (416 papers)
  5. Ann Lee (29 papers)
  6. Hung-yi Lee (325 papers)
  7. HoLam Chung (2 papers)
  8. Paola Garcia (22 papers)
Citations (11)