Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deliberation Model for On-Device Spoken Language Understanding (2204.01893v3)

Published 4 Apr 2022 in cs.CL and eess.AS

Abstract: We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings. By formulating E2E SLU as a generalized decoder, our system is able to support complex compositional semantic structures. Furthermore, the sharing of parameters between ASR and NLU makes the system especially suitable for resource-constrained (on-device) environments; our proposed approach consistently outperforms strong pipeline NLU baselines by 0.60% to 0.65% on the spoken version of the TOPv2 dataset (STOP). We demonstrate that the fusion of text and audio features, coupled with the system's ability to rewrite the first-pass hypothesis, makes our approach more robust to ASR errors. Finally, we show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training, but more work is required to make text-to-speech (TTS) a viable solution for scaling up E2E SLU.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Duc Le (46 papers)
  2. Akshat Shrivastava (25 papers)
  3. Paden Tomasello (17 papers)
  4. Suyoun Kim (22 papers)
  5. Aleksandr Livshits (5 papers)
  6. Ozlem Kalinli (49 papers)
  7. Michael L. Seltzer (34 papers)
Citations (12)