Simplified Mamba-based Architecture for Vision and Multivariate Time Series Analysis
Introduction
Recent advancements in the field of deep learning have highlighted the efficiency of Transformer models in handling sequential data across various domains including NLP and computer vision. However, the quadratic complexity of multi-headed self-attention (MHSA) presents challenges in scaling these models, particularly for long sequences. In response, State Space Models (SSMs) such as S4, and more recently, Mamba, have emerged as potent alternatives. This paper introduces SiMBA, a simplified Mamba-based architecture that leverages Einstein FFT (EinFFT) for channel modeling alongside the Mamba block for sequence modeling. Through extensive evaluation, SiMBA is demonstrated to outperform both existing SSMs and state-of-the-art transformers across a range of benchmarks.
Motivation and Background
Transformers and SSMs have become pivotal in processing sequential data, with their ability to capture long-range dependencies. However, the limitation of MHSA's quadratic computational cost with respect to sequence length motivates the exploration of SSMs. Mamba, a recent SSM, aims to address the efficiency and inductive bias issues in transformers by introducing a selective state space mechanism to optimize information propagation. Despite its innovations, Mamba's scalability to large network sizes, especially in computer vision tasks, suffers due to training instability. To overcome these obstacles, we propose SiMBA, which incorporates a novel channel modeling technique, EinFFT, to enhance stability and performance.
Model Architecture
SiMBA innovates by introducing EinFFT for spectral channel mixing, complementing the Mamba block's selective state space approach for sequence modeling. This combination addresses both the quadratic complexity problem and the stability issues observed with Mamba, positioning SiMBA as a leading architecture for processing long sequences.
- Sequence Modeling with Mamba: SiMBA utilizes the Mamba architecture to model sequences, incorporating it as a modular component capable of handling long-range dependencies efficiently.
- Channel Modeling with EinFFT: The EinFFT technique is a pivotal innovation in SiMBA, designed specifically to tackle the challenges in channel modeling presented by Mamba. Through complex eigenvalue computations and the application of Fourier Transforms, EinFFT significantly enhances the model's stability and information capture capabilities.
Main Contributions
- Stability and Performance: SiMBA's primary contribution lies in its ability to maintain model stability while scaling to large networks, surpassing Mamba and other SSMs in performance benchmarks.
- Cross-Domain Efficacy: Extensive testing on both image and time-series datasets establishes SiMBA's versatility, demonstrating its superior performance compared to leading attention-based transformers and traditional SSMs across multiple domains.
- EinFFT for Channel Modeling: The introduction of EinFFT represents a significant advancement in the field of state space modeling, offering a robust solution for spectral channel mixing that solves the previously noted stability issues.
Experimental Results
SiMBA achieves new state-of-the-art performance across numerous benchmarks, including a notable improvement on the ImageNet dataset and several time series forecasting tasks. In comparison to attention-based models and other SSMs, SiMBA not only demonstrates superior accuracy but also exhibits remarkable efficiency and generalization across transfer learning tasks.
Future Directions
This paper paves the way for future investigations into alternative sequence and channel modeling techniques within the SiMBA framework. The adaptability of SiMBA suggests potential enhancements through the exploration of different structural configurations, promising further advancements in both theoretical and practical applications of deep learning for sequential data processing.
Conclusion
SiMBA addresses the critical challenges faced by both traditional transformers and recent SSMs, offering a robust solution that combines the strengths of Mamba's selective state space mechanism and the novel EinFFT channel modeling technique. The model's exceptional performance across a diverse set of benchmarks underscores its potential as a transformative approach in the domain of sequential data processing.