Character-Level Incremental Speech Recognition with Recurrent Neural Networks (1601.06581v2)

Published 25 Jan 2016 in cs.CL, cs.LG, and cs.NE

Abstract: In real-time speech recognition applications, the latency is an important issue. We have developed a character-level incremental speech recognition (ISR) system that responds quickly even during the speech, where the hypotheses are gradually improved while the speaking proceeds. The algorithm employs a speech-to-character unidirectional recurrent neural network (RNN), which is end-to-end trained with connectionist temporal classification (CTC), and an RNN-based character-level LLM (LM). The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding. The RNN LM augments the decoding by providing long-term dependency information. We propose tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency. This system not only responds quickly on speech but also can dictate out-of-vocabulary (OOV) words according to pronunciation. The proposed model achieves the word error rate (WER) of 8.90% on the Wall Street Journal (WSJ) Nov'92 20K evaluation set when trained on the WSJ SI-284 training set.

Authors (2)

Kyuyeon Hwang (12 papers)
Wonyong Sung (33 papers)

Citations (61)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Character-Level Incremental Speech Recognition with Recurrent Neural Networks (1601.06581v2)

Summary

Related Papers