Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models (2409.00563v3)

Published 31 Aug 2024 in cs.LG, cs.SY, and eess.SY

Abstract: Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and LLMs at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for NLP applications. Moreover, we reinforce stability on the $nxn$ $A$ matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$, $B$, $C$, and $D$ matrices at each time step, leading to increased complexity and computational costs. Furthermore, the $A$ matrix in Mamba2 is not always stable. We demonstrate a reduction of parameters compared to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. We further enforce stability on the $A$ matrix in Mamba2 to improve the loss and perplexity of the model. The controllable and stable $n \times n$ state matrix $A$ is sparse, and it has only $n$ free parameters. Our novel approach will ensure controllable/observable and stable SSMs, which will be the gate key for Mamba3.

Summary

We haven't generated a summary for this paper yet.