Emergent In-Context Learning in State-Space Models

Determine whether in-context learning is an emergent capability in state-space models such as Mamba, comparable to the emergent in-context learning observed in Transformer language models.

Background

The paper compares pure Transformer, pure Mamba, and a hybrid Attention–Mamba architecture. While pure Mamba performs well on many tasks, it notably underperforms on datasets like IMDB, QuAC, and NarrativeQA, where adherence to input-output formats and in-context learning are important.

The authors observe induction-like attention patterns in the hybrid model but note uncertainty about whether similar in-context learning capabilities emerge in state-space models without attention. They explicitly state that it is not clear if in-context learning emerges in SSMs as it typically does in Transformers, leaving this as an open question for future investigation.

References

While Mamba may learn to copy and perform simple ICL when explicitly trained to do so (, it is not clear if ICL is an emergent capability in SSM as is typical of Transformer models.

Jamba: A Hybrid Transformer-Mamba Language Model (2403.19887 - Lieber et al., 28 Mar 2024) in Section 5.2 (Why does the Combination Work?)