Locally Bi-Directional SSMs
- Locally Bi-Directional State-Space Models (LB-SSMs) are neural architectures that combine forward scanning with local backward recurrences within fixed-size windows, enabling efficient bidirectional context aggregation.
- They employ a dual-pass approach that restricts the backward scan to local blocks, reducing computational overhead compared to full-sequence bi-directional models while maintaining near-global receptive fields.
- LB-SSMs are effectively applied in vision, time series imputation, remote sensing, and biomedical signal processing, offering notable improvements in accuracy and throughput with minimal runtime impact.
Locally Bi-Directional State-Space Models (LB-SSMs) are a class of neural architectures wherein state space models (SSMs) are endowed with local bi-directionality, enabling each token or location to aggregate contextual information from both forward and backward directions, but only within a restricted (local) window. Such models attain much of the empirical performance gain of bidirectional models, yet preserve the computational efficiency of unidirectional linear-time SSMs, circumventing the quadratic or double-pass cost associated with global bi-directional sweeps. This design is foundational to recent advances in vision, time series, remote sensing, and multivariate signal processing, offering linear complexity, improved receptive fields, and compatibility with modern GPU architectures (Zhang et al., 19 Jun 2025, Zhang et al., 3 Sep 2025, Wu et al., 26 Jan 2025, Solís-García et al., 2024).
1. Core Mathematical Structures of Locally Bi-Directional SSMs
LB-SSMs are generally formulated by augmenting a standard unidirectional selective SSM scan with a local backward recurrence, executed over small blocks (tiles) or patches. Let denote the -th input, and , the forward and backward hidden state vectors in , respectively.
Selective SSM Parameters
At each position , input-dependent parameters are computed: where and are linear maps (Zhang et al., 19 Jun 2025).
Forward and Local Backward Recurrences
Sequences are partitioned into blocks of size (e.g., 0), matching per-thread register tiles in parallel hardware. The forward scan is global, while the backward pass is restricted within each block: 1 Fused output at each 2: 3
Generalization
Extensions integrate bi-directionality through separate forward/backward SSM passes (with learned or tied parameters), local windowings, and gating mechanisms (Zhang et al., 3 Sep 2025, Solís-García et al., 2024), in both temporal and spatial dimensions, plus multi-branch architectures for multi-channel or multi-sensor signals (e.g., ECG leads).
2. Efficient GPU Implementation and Linear Complexity
LB-SSM architectures are specifically tailored for modern GPU memory hierarchies. The backward scan is performed entirely in per-thread registers after the forward tile scan, requiring no extra global memory traffic or synchronization. This results in a minor arithmetic overhead (427%) and a negligible wall-clock runtime increase (52%), compared with single forward-scan SSMs, whereas a naïve full-sequence bi-directional approach would approximately double the time and bandwidth requirements (Zhang et al., 19 Jun 2025).
The per-block pseudocode: 5
This register-only formulation ensures that both arithmetic cost and memory usage scale as 6, with a small additive 7 per-thread register footprint (8). This regime sharply contrasts with 9-scaling architectures such as transformers (Zhang et al., 19 Jun 2025, Solís-García et al., 2024).
3. Architectural Variants and Application-Specific Instantiations
Vision Backbones (LBVim)
LBVim alternates scan direction at the end of each LBMamba block: after every block, the sequence order is reversed. Stacking 0 such blocks guarantees global receptive fields within 1 layers, with no global backward scan ever required. This strategy recovers information flow between all token pairs and avoids the throughput degradation typical of double-sweep bi-directional models (Zhang et al., 19 Jun 2025).
Multi-Lead and Multi-Branch Designs
In multi-sensor time series, each branch (e.g., ECG lead) uses independent bi-directional SSM blocks, followed by temporal and spatial fusion (e.g., SENet) modules. Long input sequences are tokenized into segments/patches, each processed locally in both directions. Outputs are gated, summed, and passed to higher-level fusion (Zhang et al., 3 Sep 2025).
Windowed and Locally Adaptive Scans
In remote sensing and imputation tasks, local windowing is combined with SSM scanning. CD-Lamba employs:
- Locally Adaptive State-Space Scan (LASS): partitions features spatially into dynamic blocks (via Gumbel-softmax selection).
- Cross-Temporal State-Space Scan (CTSS): interleaves pixels from co-located pre/post images into a bi-directional SSM scan.
- Window Shifting and Perception (WSP): shifts window assignment to ensure cross-boundary information flow (Wu et al., 26 Jan 2025).
In all settings, bi-directionality is realized via local or patchwise SSM reversals, preventing loss of context without global computation.
4. Comparative Computational and Empirical Properties
| Model | Directionality | Complexity | Context Aggregation | Efficiency | Use Cases |
|---|---|---|---|---|---|
| Mamba | Unidirectional | 2 | Past-only (forward) | High | Vision, time series, WSI |
| Bi-Mamba | Global Bi-dir | 3 | Full sequence | Lower | Vision, time series |
| LBMamba | Local Bi-dir | 4 | Global (via alternation) | Highest plus broad context | Vision, WSI, real-time low-latency tasks |
Empirically, LBVim backbones built on LBMamba achieve 5–6 percentage point (pp) higher accuracy on ImageNet-1K and ADE20K for the same throughput, with 7–8\% throughput compared to standard Vim backbones (Zhang et al., 19 Jun 2025). Similarly, S2M2ECG’s locally bi-directional SSMs deliver 9–0 F1 point improvements in ECG tasks with 1M parameters (Zhang et al., 3 Sep 2025), and TIMBA imputation achieves consistent MAE/MSE reductions under high missingness in time-series data (Solís-García et al., 2024).
5. Methodological Insights and Domain-Specific Considerations
LB-SSMs exploit the inherent locality and sequentiality of specific domains. In vision, local backward scans within register tiles allow large-scale images (including gigapixel WSI) to be processed efficiently without global backward passes. In multi-channel ECG, patch-based bi-directional SSMs yield biologically relevant context aggregation (e.g., Q-T interval features) and robust generalization across databases (Zhang et al., 3 Sep 2025). For change detection, locality-preserving scans identify and separate foreground/background dynamics more effectively than conventional global scans (Wu et al., 26 Jan 2025). In time series imputation, dual local S6 recurrences propagate gradients bidirectionally, mitigating vanishing/exploding phenomena and strengthening missing region reconstruction (Solís-García et al., 2024).
A plausible implication is that the degree and granularity of local bi-directionality (e.g., block size 2, patch size 3) should be tuned to match task-specific patterns—larger patches for long-range rhythm, smaller for localized morphology (Zhang et al., 3 Sep 2025).
6. Current Limitations and Research Directions
While locally bi-directional SSMs restore reciprocal context with minimal overhead, some architectural tuning remains application-specific (e.g., block size, scan alternation frequency). For highly structured non-local phenomena, pure locality may be insufficient, requiring hierarchical or multi-scale scans (window shifting, cross-temporal fusions). Research is ongoing to generalize LBMamba block designs to video, multivariate anomaly detection, and cross-modal tasks; to learn adaptive segmentation for windowed processing rather than fixed-size tilings (Wu et al., 26 Jan 2025). Another open area is extending 4-way cross-temporal SSMs for multi-temporal inputs, and fully end-to-end differentiation of window and scan parameters.
7. Summary and Outlook
Locally bi-directional state-space models combine dynamic state transition modeling with efficient local-scope bi-directional aggregation. By restructuring forward SSM scans to embed lightweight, register-limited backward recurrences, these models achieve near-ideal efficiency-throughput trade-offs while effectively restoring bidirectional or even global receptive fields via alternation or cross-window fusion. Empirical results across vision, time series imputation, remote sensing, and multivariate biomedical signal processing consistently demonstrate improved accuracy, reduced latency, and strong generalization at parameter and time costs commensurate with or only marginally above unidirectional models. The LB-SSM paradigm thus defines a broad, efficient, and generalizable foundation for future state-space and hybrid sequence modeling architectures (Zhang et al., 19 Jun 2025, Zhang et al., 3 Sep 2025, Wu et al., 26 Jan 2025, Solís-García et al., 2024).