Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Server-side Rescoring of Spoken Entity-centric Knowledge Queries for Virtual Assistants (2311.01398v1)

Published 2 Nov 2023 in cs.CL, cs.SD, and eess.AS

Abstract: On-device Virtual Assistants (VAs) powered by Automatic Speech Recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition. In this paper, we conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries using various categories of LLMs (LMs) (N-gram word LMs, sub-word neural LMs). We investigate the combination of on-device and server-side signals, and demonstrate significant WER improvements of 23%-35% on various entity-centric query subpopulations by integrating various server-side LMs compared to performing ASR on-device only. We also perform a comparison between LMs trained on domain data and a GPT-3 variant offered by OpenAI as a baseline. Furthermore, we also show that model fusion of multiple server-side LMs trained from scratch most effectively combines complementary strengths of each model and integrates knowledge learned from domain-specific data to a VA ASR system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Youyuan Zhang (8 papers)
  2. Sashank Gondala (3 papers)
  3. Thiago Fraga-Silva (3 papers)
  4. Christophe Van Gysel (24 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.