Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding (2305.12301v1)

Published 20 May 2023 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: The pre-trained speech encoder wav2vec 2.0 performs very well on various spoken language understanding (SLU) tasks. However, on many tasks, it trails behind text encoders with textual input. To improve the understanding capability of SLU encoders, various studies have used knowledge distillation to transfer knowledge from natural language understanding (NLU) encoders. We use a very simple method of distilling from a textual sentence embedder directly into wav2vec 2.0 as pre-training, utilizing paired audio-text datasets. We observed that this method is indeed capable of improving SLU task performance in fine-tuned settings, as well as full-data and few-shot transfer on a frozen encoder. However, the model performs worse on certain tasks highlighting the strengths and weaknesses of our approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yi Xuan Tan (1 paper)
  2. Navonil Majumder (48 papers)
  3. Soujanya Poria (138 papers)
Citations (2)