Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard (2001.07263v3)

Published 20 Jan 2020 in eess.AS and cs.CL

Abstract: It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training. In this paper, we show that state-of-the-art recognition performance can be achieved on the Switchboard-300 database using a single headed attention, LSTM based model. Using a cross-utterance LLM, our single-pass speaker independent system reaches 6.4% and 12.5% word error rate (WER) on the Switchboard and CallHome subsets of Hub5'00, without a pronunciation lexicon. While careful regularization and data augmentation are crucial in achieving this level of performance, experiments on Switchboard-2000 show that nothing is more useful than more data. Overall, the combination of various regularizations and a simple but fairly large model results in a new state of the art, 4.7% and 7.8% WER on the Switchboard and CallHome sets, using SWB-2000 without any external data resources.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. George Saon (39 papers)
  2. Kartik Audhkhasi (22 papers)
  3. Brian Kingsbury (54 papers)
  4. Zoltán Tüske (7 papers)
Citations (68)

Summary

We haven't generated a summary for this paper yet.