Practical viability of input-dependent SSMs for large-scale language modeling

Determine whether input-dependent state-space model architectures that increase expressivity for state tracking—such as variants that make the SSM transition matrix depend on the input (e.g., Input-Dependent S4 or Liquid S4)—are practically viable for large-scale language modeling.

Background

The paper proves that commonly used linear SSMs, including S4 and the S6 layer used by Mamba, are limited to L-uniform TC⁰ and therefore cannot express inherently sequential computations such as permutation composition (the S5 word problem).

To overcome this limitation, the authors propose a minimal architectural extension in which the SSM transition matrix depends on the input (akin to Liquid S4), and they show both theoretically and empirically that this variant can express and learn hard state-tracking tasks such as S5. However, they note that the practicality of deploying such more expressive SSMs at the scale of modern language modeling remains unresolved.

References

It is an open question whether such SSM architectures with greater expressivity for state tracking are practically viable for large-scale language modeling.

— The Illusion of State in State-Space Models (2404.08819 - Merrill et al., 12 Apr 2024) in Introduction, final paragraph

Practical viability of input-dependent SSMs for large-scale language modeling

Sponsor

Background

References

Related Problems