A Formal Hierarchy of RNN Architectures (2004.08500v4)

Published 18 Apr 2020 in cs.CL and cs.FL

Abstract: We develop a formal hierarchy of the expressive capacity of RNN architectures. The hierarchy is based on two formal properties: space complexity, which measures the RNN's memory, and rational recurrence, defined as whether the recurrent update can be described by a weighted finite-state machine. We place several RNN variants within this hierarchy. For example, we prove the LSTM is not rational, which formally separates it from the related QRNN (Bradbury et al., 2016). We also show how these models' expressive capacity is expanded by stacking multiple layers or composing them with different pooling functions. Our results build on the theory of "saturated" RNNs (Merrill, 2019). While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy. Experimental findings from training unsaturated networks on formal languages support this conjecture.

Citations (71)

View on Semantic Scholar

Summary

The paper introduces a formal hierarchy that categorizes RNN variants by their space complexity and rational recurrence properties.
Experimental results demonstrate that non-rational recurrent models like LSTMs excel in complex sequence recognition without additional decoding layers.
The study highlights design implications for future neural architectures that balance computational efficiency with enhanced expressive capabilities.

A Formal Hierarchy of RNN Architectures

The paper, "A Formal Hierarchy of RNN Architectures," provides a comprehensive analysis of Recurrent Neural Network (RNN) architectures, focusing on their expressive capabilities. The authors introduce a formal hierarchy that categorizes RNN variants based on two key properties: space complexity and rational recurrence.

Hierarchical Analysis of RNN Architectures

The paper classifies RNNs using a hierarchy determined by space complexity, which refers to the network's memory capacity, and rational recurrence, which considers whether the recurrent state can be described with a weighted finite-state machine (WFA). The hierarchy helps distinguish between different RNN architectures, such as Long Short-Term Memory (LSTM) networks, Quasi-Recurrent Neural Networks (QRNNs), Elman networks, Gated Recurrent Units (GRUs), and convolutional networks (CNNs).

Space Complexity and Rational Recurrence

Space Complexity: The hierarchical classification leverages the notion of space complexity to categorize RNNs into classes with constant, logarithmic, or linear growth regarding the number of configurations an RNN can represent, which affects their memory capacity.
- Constant Space Complexity: Encoders like CNNs, saturated Elman RNNs, and GRUs have finite state memory, placing them within a constant class.
- Logarithmic Space Complexity: LSTMs and QRNNs exhibit logarithmic space complexity, allowing the encoding of more complex patterns.
Rational Recurrence: An encoder is rationally recurrent if its state expressiveness can be described by a rational series, specifically calculated by using WFAs.
- Rational and Non-Rational RNNs: The LSTM is not rationally recurrent due to its ability to perform dynamic counting, differentiating it from QRNNs, which align with rational series due to their structure.

Empirical and Theoretical Insights

The paper provides experimental evidence reinforcing the theoretical hierarchy. Experiments on formal languages such as $a^nb^n$ illustrate model capabilities. The LSTM, identified as non-rational recurrent, excels in recognizing certain formal languages without requiring additional decoder layers, contrary to RNNs constrained by rational recurrences, such as QRNNs which may need supplementary decoding layers.

Implications and Future Directions

The distinction between rational and non-rational architectures highlighted in the hierarchy has practical implications for the design of neural networks. It suggests that incorporating features from non-rational recurrent architectures like LSTMs can enhance model capabilities, especially for tasks requiring dynamic memory or complex sequence recognition.

The authors speculate that future developments in AI might focus on creating more expressive rational RNNs which retain computational efficiency while enhancing state expressiveness and robustness. Extending the framework to models like Transformers, known for their self-attention mechanism, is another potential avenue for research, as initial observations suggest they are not rationally recurrent and may share space complexity characteristics with RNNs.

In conclusion, the paper successfully constructs a nuanced understanding of RNN capabilities. It advances the theoretical discourse on neural network architectures and sets a foundation for further exploration of architectures balancing expressiveness with computational efficiency.

PDF Markdown

Related Papers

YouTube

Show All Videos