Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Introducing Semantics into Speech Encoders (2211.08402v1)

Published 15 Nov 2022 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to LLM systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding performance by over 10\% on intent classification, with modest gains in named entity resolution and slot filling, and spoken question answering FF1 score by over 2\%. Our unsupervised approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Derek Xu (10 papers)
  2. Shuyan Dong (7 papers)
  3. Changhan Wang (46 papers)
  4. Suyoun Kim (22 papers)
  5. Zhaojiang Lin (45 papers)
  6. Akshat Shrivastava (25 papers)
  7. Shang-Wen Li (55 papers)
  8. Liang-Hsuan Tseng (9 papers)
  9. Alexei Baevski (39 papers)
  10. Guan-Ting Lin (21 papers)
  11. Hung-yi Lee (327 papers)
  12. Yizhou Sun (149 papers)
  13. Wei Wang (1793 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.