Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Context Dependent RNNLM for Automatic Transcription of Conversations (2008.03517v1)

Published 8 Aug 2020 in eess.AS

Abstract: Conversational speech, while being unstructured at an utterance level, typically has a macro topic which provides larger context spanning multiple utterances. The current LLMs in speech recognition systems using recurrent neural networks (RNNLM) rely mainly on the local context and exclude the larger context. In order to model the long term dependencies of words across multiple sentences, we propose a novel architecture where the words from prior utterances are converted to an embedding. The relevance of these embeddings for the prediction of next word in the current sentence is found using a gating network. The relevance weighted context embedding vector is combined in the LLM to improve the next word prediction, and the entire model including the context embedding and the relevance weighting layers is jointly learned for a conversational LLMing task. Experiments are performed on two conversational datasets - AMI corpus and the Switchboard corpus. In these tasks, we illustrate that the proposed approach yields significant improvements in LLM perplexity over the RNNLM baseline. In addition, the use of proposed conversational LM for ASR rescoring results in absolute WER reduction of $1.2$\% on Switchboard dataset and $1.0$\% on AMI dataset over the RNNLM based ASR baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Srikanth Raj Chetupalli (20 papers)
  2. Sriram Ganapathy (72 papers)
Citations (3)