Impact of Absent Attention Mechanism on Mamba’s In-Context Learning
Ascertain whether the absence of an attention mechanism in pure Mamba state-space models causes difficulty in learning in-context.
References
We conjecture that the lack of an attention mechanism in the pure Mamba model makes it difficult for it to learn in-context.
— Jamba: A Hybrid Transformer-Mamba Language Model
(2403.19887 - Lieber et al., 28 Mar 2024) in Section 5.2 (Why does the Combination Work?)