Highway Long Short-Term Memory RNNs for Distant Speech Recognition (1510.08983v2)

Published 30 Oct 2015 in cs.NE, cs.AI, cs.CL, cs.LG, and eess.AS

Abstract: In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Efficient algorithms are proposed to train these novel networks using both frame and sequence discriminative criteria. Experiments on the AMI distant speech recognition (DSR) task indicate that we can train deeper LSTMs and achieve better improvement from sequence training with highway LSTMs (HLSTMs). Our novel model obtains $43.9/47.7\%$ WER on AMI (SDM) dev and eval sets, outperforming all previous works. It beats the strong DNN and DLSTM baselines with $15.7\%$ and $5.3\%$ relative improvement respectively.

Citations (288)

View on Semantic Scholar

Summary

The paper introduces Highway LSTM RNNs that integrate gated direct connections to mitigate the gradient vanishing problem in deep speech models.
It demonstrates that this architecture enables training deeper networks, achieving WER improvements of 15.7% over DNNs and 5.3% over baseline DLSTM RNNs on the AMI dataset.
Additionally, the incorporation of dropout mechanisms and latency-controlled bidirectional LSTMs enhances performance for real-time distant speech recognition tasks.

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

The paper presents a novel approach to enhancing the efficacy of deep LSTM (Long Short-Term Memory) recurrent neural networks by introducing highway connections for distant speech recognition tasks. This paper is grounded in the premise that deep LSTM RNNs, though proficient in handling temporal dependencies, encounter challenges such as gradient vanishing when scaling to greater depths. The authors propose the integration of gated direct connections—termed highway connections—between memory cells in successive layers to mitigate this issue.

One significant innovation introduced is the Highway LSTM (HLSTM) RNN, which incorporates these direct connections, thereby allowing unimpeded flow of information and alleviating the gradient vanishing problem. Such an architecture enables the training of deeper networks, offering potential gains in model performance. The authors also present latency-controlled bidirectional LSTMs (LC-BLSTMs), aiming to leverage complete historical context while managing inference latency—a crucial consideration for real-time applications.

Empirical evaluations are carried out on the AMI single distant microphone (SDM) dataset. The results are compelling: the proposed highway LSTM RNNs achieve substantial improvements over existing deep LSTM benchmarks, with WER reductions of approximately 15.7% compared to DNNs and 5.3% against baseline DLSTM RNNs. These improvements underscore the robustness of highway connections, particularly in accommodating sequence-level training complexities. The inclusion of dropout mechanisms further enhances performance by regulating highway connection activity.

The theoretical implications presented align with the broader discourse on neural network depth and training scalability. The ability to train networks of virtually arbitrary depth without prohibitive latency opens up expansive possibilities in modeling capabilities and applied AI systems. Practically, the achievement of the observed WER, 43.9% on the AMI (SDM) development set and 47.7% on the evaluation set, sets a high watermark for distant speech recognition tasks.

Looking forward, such architectural advancements in recurrent networks could evolve into broader applications beyond speech recognition, potentially influencing other domains that require processing of sequential data with complex temporal dependencies. The paper presents a robust framework that merits further exploration, especially in integrating highway connections with other novel neural paradigms and extending these methods across diverse datasets.

PDF Markdown

Highway Long Short-Term Memory RNNs for Distant Speech Recognition (1510.08983v2)

Summary

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Related Papers