Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN (1803.04831v3)

Published 13 Mar 2018 in cs.CV and cs.LG

Abstract: Recurrent neural networks (RNNs) have been widely used for processing sequential data. However, RNNs are commonly difficult to train due to the well-known gradient vanishing and exploding problems and hard to learn long-term patterns. Long short-term memory (LSTM) and gated recurrent unit (GRU) were developed to address these problems, but the use of hyperbolic tangent and the sigmoid action functions results in gradient decay over layers. Consequently, construction of an efficiently trainable deep network is challenging. In addition, all the neurons in an RNN layer are entangled together and their behaviour is hard to interpret. To address these problems, a new type of RNN, referred to as independently recurrent neural network (IndRNN), is proposed in this paper, where neurons in the same layer are independent of each other and they are connected across layers. We have shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies. Moreover, an IndRNN can work with non-saturated activation functions such as relu (rectified linear unit) and be still trained robustly. Multiple IndRNNs can be stacked to construct a network that is deeper than the existing RNNs. Experimental results have shown that the proposed IndRNN is able to process very long sequences (over 5000 time steps), can be used to construct very deep networks (21 layers used in the experiment) and still be trained robustly. Better performances have been achieved on various tasks by using IndRNNs compared with the traditional RNN and LSTM. The code is available at https://github.com/Sunnydreamrain/IndRNN_Theano_Lasagne.

Authors (5)

Shuai Li (295 papers)
Wanqing Li (53 papers)
Chris Cook (5 papers)
Ce Zhu (85 papers)
Yanbo Gao (10 papers)

Citations (686)

View on Semantic Scholar

Summary

The paper presents IndRNN, which independently regulates neurons to address vanishing and exploding gradients, enabling learning over sequences of more than 5000 steps.
It employs non-saturated activation functions like ReLU and supports stacking for constructing deeper networks, surpassing traditional RNN limitations.
Experimental results demonstrate that IndRNN outperforms LSTM and GRU in tasks such as sequential MNIST, language modeling, and skeleton-based action recognition.

Independently Recurrent Neural Network (IndRNN): Building Longer and Deeper RNNs

Introduction

The paper introduces an innovative approach to addressing the limitations of traditional recurrent neural networks (RNNs) by proposing the Independently Recurrent Neural Network (IndRNN). The primary focus is to alleviate the challenges associated with training RNNs, particularly the gradient vanishing and exploding problems. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks have traditionally addressed these issues, but they often face difficulties in learning long-term dependencies due to their non-linear activation functions.

Key Contributions

The IndRNN is designed to allow neurons within the same layer to operate independently from each other, connecting only across layers. This distinct architecture offers several benefits:

Gradient Control: By regulating the gradient backpropagation through time, IndRNN effectively addresses gradient vanishing and exploding issues. This facilitates learning long-term dependencies over extended sequences.
Non-Saturated Activation Functions: IndRNN works robustly with activation functions like ReLU, which do not suffer from the gradient decay problems seen in LSTM and GRU.
Stacking Capability: IndRNNs can be effectively stacked to construct deeper networks, surpassing the depth of typical RNN architectures, allowing for deeper and more complex models.
Interpretability: The independence of neurons enhances the interpretability of each neuron's function within a layer.

Experimental Results

IndRNN demonstrates superior performance across multiple tasks:

Adding Problem: IndRNN excels in tasks requiring long-term memory, processing sequences of over 5000 steps—a significant improvement over LSTM's capacity.
Sequential MNIST and Permuted MNIST: A six-layer IndRNN achieves better error rates than existing RNN models, demonstrating its capability for processing complex sequences.
LLMling on the Penn Treebank Dataset: Utilizing a 21-layer residual IndRNN, the model achieves competitive results in character-level modeling, highlighting its depth and efficiency.
Skeleton-Based Action Recognition: On the NTU RGB+D dataset, the IndRNN shows significant performance improvements over LSTM and traditional RNN models, indicating its robustness in handling real-world sequence data.

Implications

The introduction of IndRNN marks a significant step toward developing more efficient and interpretable sequential models. By removing intra-layer dependencies and addressing gradient issues through independent neurons and constrained recurrent weights, IndRNN provides a solution that can be adapted for diverse long-sequence tasks. The ability to construct deeper networks without succumbing to the typical limitations of RNNs paves the way for IndRNNs to be used in more complex applications requiring nuanced sequential understanding.

Future Directions

The success of IndRNN in various sequence processing tasks suggests several potential areas for future exploration:

Integration with Reinforcement Learning: IndRNNs could enhance sequential decision-making processes in reinforcement learning scenarios due to their extended memory capabilities.
Hybrid Architectures: Combining IndRNNs with convolutional or attention mechanisms could improve performance on multimodal tasks.
Optimization Techniques: Further research into optimization strategies specific to IndRNN could unlock even greater potential in training and application scope.

In conclusion, the IndRNN represents a promising advancement in RNN architectures, addressing persistent challenges and opening new pathways for developing high-performing, interpretable, and robust models in sequential data processing.

PDF Markdown