The Illusion of State in State-Space Models
Introduction
State-Space Models (SSMs) have been proposed as a potential improvement over transformers, especially with their theoretical advantages in state tracking and handling inherently sequential computations. The crux of the advancement hinges on the belief that SSMs, owing to their architectural resemblance to Recurrent Neural Networks (RNNs), exhibit superior expressive power in modeling contexts that demand a high fidelity of state management, such as narrative comprehension, chess move tracking, and code evaluation. This paper undertakes a comprehensive examination of this postulated advantage, critically evaluating whether SSMs indeed stand to offer an elevated expressive capability over transformers in the context of state tracking.
Theoretical Analysis
Our theoretical investigation employs the complexity class as a framework to assess the expressive power of both transformers and SSMs, particularly focusing on their ability to manage state information accurately. We extend existing findings to demonstrate that SSMs, analogous to transformers, are confined within the complexity class . This revelation is significant, as it implies that SSMs, contrary to prior assertions, lack the theoretical foundation to express computations that extend beyond 's boundaries. Consequently, this challenges the view that SSMs can inherently track complex sequences of states—such as those exemplified by permutation compositions, a fundamental aspect of -hard problems, which are crucial for tasks like chess move tracking or code evaluation.
Our critique extends to the analysis of linear SSMs and their near generalizations, where we show despite the recurrent formulation of SSMs, they are similarly incapable as transformers in solving inherently sequential problems.
Empirical Analysis
Complementing our theoretical examination, we provide an empirical analysis leveraging the word problem for permutations () as a representative test case. The experimental results align with our theoretical predictions: SSMs alongside transformers exhibit notable difficulties in addressing state tracking manifested in permutation composition, despite their contrasting architectural designs. Particularly interesting is the observed performance discrepancy with RNNs, which adeptly manage to compose permutations with a singular layer—a stark contrast to the limited expressive power observed in SSMs and transformers.
Proposed Extensions
In light of the limitations identified, we proposed minimal extensions to SSMs aiming to bridge the gap in expressive power for state tracking. These include incorporating nonlinear activation functions and enabling input-dependent transition matrices. These proposed modifications showcase a theoretical ability to navigate beyond the constraints of , thereby enabling the models to effectively solve permutation composition problems. However, these extensions warrant further scrutinization, especially concerning their practicality in terms of parallelism and learning dynamics.
Conclusion and Future Directions
Our comprehensive analysis dispels the illusion of statefulness in SSMs, positioning them on par with transformers concerning their state tracking capabilities within the complexity class . Despite their architectural differences, both models exhibit similar expressiveness limitations, challenging the notion that SSMs could supplant transformers in tasks requiring intricate state management.
The findings motivate further research into the development of SSM-like architectures that genuinely bridge the expressive power gap for state tracking while maintaining robust parallelizability and favorable learning dynamics. Future explorations could investigate the practical implementations of the proposed SSM extensions and their efficacy in real-world state tracking problems, offering insights into possible architectural innovations that balance the need for expressive power and computational efficiency.