Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

Published 31 Aug 2024 in cs.LG, cs.SY, and eess.SY | (2409.00563v3)

Abstract: Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and LLMs at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for NLP applications. Moreover, we reinforce stability on the $nxn$ $A$ matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$, $B$, $C$, and $D$ matrices at each time step, leading to increased complexity and computational costs. Furthermore, the $A$ matrix in Mamba2 is not always stable. We demonstrate a reduction of parameters compared to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. We further enforce stability on the $A$ matrix in Mamba2 to improve the loss and perplexity of the model. The controllable and stable $n \times n$ state matrix $A$ is sparse, and it has only $n$ free parameters. Our novel approach will ensure controllable/observable and stable SSMs, which will be the gate key for Mamba3.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.