Dice Question Streamline Icon: https://streamlinehq.com

Principled design of scaffolding around SSM layers

Determine principled, theoretically grounded design rules for the pre-processing and post-processing scaffolding that surrounds discrete-time linear state-space layers in sequence modeling architectures (including S4, S4D, S5, LRU, S6/Mamba, and RG-LRU), specifying when particular gating mechanisms and mapping choices should be used and why they yield superior performance.

Information Square Streamline Icon: https://streamlinehq.com

Background

State space model (SSM) architectures do not use the linear recurrence in isolation; they rely on additional pre-processing of inputs and post-processing (often with gating) of outputs, collectively termed “scaffolding.” Several scaffolding variants have been proposed in the literature, such as MLP-style blocks, H3, and the Mamba scaffolding, but their selection is currently driven by empirical performance rather than theory.

The authors explicitly note that the choice and design of these scaffolding components are not well understood, indicating a need for formal guidance on how to construct and select them for different tasks and model parameterizations.

References

It is important to note that the choice and design of the scaffolding is not well-understood, and often the one that is most performant in practice is selected.

State Space Models as Foundation Models: A Control Theoretic Overview (2403.16899 - Alonso et al., 25 Mar 2024) in Section 2.6, Scaffolding and Layers