Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attention-based Contextual Language Model Adaptation for Speech Recognition (2106.01451v1)

Published 2 Jun 2021 in cs.CL and cs.AI

Abstract: LLMing (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance was spoken, provides a rich input signal. We introduce an attention mechanism for training neural speech recognition LLMs on both text and non-linguistic contextual data. When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7.0% relative over a standard LM that does not incorporate contextual information. When evaluated on utterances extracted from the long tail of the dataset, our method improves perplexity by 9.0% relative over a standard LM and by over 2.8% relative when compared to a state-of-the-art model for contextual LM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Richard Diehl Martinez (13 papers)
  2. Scott Novotney (3 papers)
  3. Ivan Bulyko (23 papers)
  4. Ariya Rastrow (55 papers)
  5. Andreas Stolcke (57 papers)
  6. Ankur Gandhe (30 papers)
Citations (5)