Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion (1906.11604v1)

Published 27 Jun 2019 in cs.CL, cs.SD, and eess.AS

Abstract: We present a novel conversational-context aware end-to-end speech recognizer based on a gated neural network that incorporates conversational-context/word/speech embeddings. Unlike conventional speech recognition models, our model learns longer conversational-context information that spans across sentences and is consequently better at recognizing long conversations. Specifically, we propose to use the text-based external word and/or sentence embeddings (i.e., fastText, BERT) within an end-to-end framework, yielding a significant improvement in word error rate with better conversational-context representation. We evaluated the models on the Switchboard conversational speech corpus and show that our model outperforms standard end-to-end speech recognition models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Suyoun Kim (22 papers)
  2. Siddharth Dalmia (36 papers)
  3. Florian Metze (79 papers)
Citations (23)