Omni Selective Scan (OSS) for Vision SSMs

Updated 2 April 2026

Omni Selective Scan (OSS) is a mechanism that enhances the spatial modeling of visual state space models by enabling efficient, multi-directional scans.
It performs independent directional scans—horizontal, vertical, diagonal, and channel-wise—to enable robust global and local feature propagation with linear computational complexity.
OSS integrates a directional scan module with an O-Attention fusion mechanism, significantly boosting performance in applications like image restoration and semantic segmentation.

Omni Selective Scan (OSS) is a mechanism for enhancing the spatial modeling capacity of visual state space models (SSMs). OSS addresses the critical limitation of unidirectional or causally sequenced SSMs by enabling efficient, bidirectional, and multi-directional information flow across two-dimensional grid structures and channel dimensions, while maintaining linear computational complexity. OSS underpins recent vision architectures such as VmambaIR and OCTOPUS, facilitating strong global and local feature propagation in a computationally efficient manner and resulting in state-of-the-art performance across various low-level and high-level vision tasks (Shi et al., 2024, Mahatha et al., 31 Jan 2026).

1. Foundations: State Space Models and Visual Sequence Modeling

State space models (SSMs) are rooted in control theory and are defined by continuous or discrete time dynamics that map sequences of inputs $x(t)\in\mathbb{R}^D$ through a hidden state $h(t)\in\mathbb{R}^N$ to output $y(t)\in\mathbb{R}^D$ . The discretized evolution for standard SSMs is governed by: $h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ where matrices $\bar{A}, \bar{B}, C, D$ are learned. Although SSMs such as S4, S5, and Mamba provide efficient long-range sequence modeling with linear time and memory complexity, naïve application to images via rasterization undermines local spatial relationships and fails to propagate information isotropically across the 2D grid. This causal, 1D formulation links non-adjacent pixels while simultaneously ignoring direct neighbors, impeding spatial coherence crucial for vision tasks (Shi et al., 2024, Mahatha et al., 31 Jan 2026).

2. Multi-Directional and Omni-Directional Feature Propagation

Omni Selective Scan (OSS) generalizes the recurrence mechanism of SSMs by performing independent, discrete scans in multiple directions. In VmambaIR, OSS performs six bidirectional scans: horizontal forward/backward, vertical forward/backward, and channel-wise forward/backward (Shi et al., 2024). In OCTOPUS, OSS extends this further to eight principal spatial orientations: right ( $\rightarrow$ ), left ( $\leftarrow$ ), down ( $\downarrow$ ), up ( $\uparrow$ ), southeast ( $\searrow$ ), northwest ( $h(t)\in\mathbb{R}^N$ 0), southwest ( $h(t)\in\mathbb{R}^N$ 1), and northeast ( $h(t)\in\mathbb{R}^N$ 2) (Mahatha et al., 31 Jan 2026).

Each scan processes a set of independent 1D lines (rows, columns, or diagonals for spatial dimensions; channels for depth), applying SSM recurrences: $h(t)\in\mathbb{R}^N$ 3 where $h(t)\in\mathbb{R}^N$ 4 indexes direction, $h(t)\in\mathbb{R}^N$ 5 indexes scan-line, and $h(t)\in\mathbb{R}^N$ 6 is a learned gate (Mahatha et al., 31 Jan 2026). All directions are processed independently in parallel, preserving strict $h(t)\in\mathbb{R}^N$ 7 complexity.

3. OSS Block Structure, Traversal Selection, and Fusion

The OSS block comprises a directional scan module and an efficient feature fusion scheme. Each directional scan outputs a set of features aligned to 2D pixel locations. After all directions are processed, a traversal selection (O-Attention) mechanism fuses the multi-directional context at each spatial location. Specifically, for each pixel $h(t)\in\mathbb{R}^N$ 8:

The outputs from all scanned directions $h(t)\in\mathbb{R}^N$ 9 are stacked into $y(t)\in\mathbb{R}^D$ 0, where $y(t)\in\mathbb{R}^D$ 1 is the number of directions.
Two $y(t)\in\mathbb{R}^D$ 2 convolutions (or linear layers) compute scores $y(t)\in\mathbb{R}^D$ 3, followed by a softmax normalization over directions, yielding attention weights $y(t)\in\mathbb{R}^D$ 4.
The fused output is $y(t)\in\mathbb{R}^D$ 5 (Mahatha et al., 31 Jan 2026).

In VmambaIR, additional channel-wise SSM scans are incorporated after spatial fusion, followed by a $y(t)\in\mathbb{R}^D$ 6 convolutional projection (Shi et al., 2024).

Alongside the OSS module, the Efficient Feed-Forward Network (EFFN) operates on the output, comprising a $y(t)\in\mathbb{R}^D$ 7 expansion, depthwise convolution, gated linear unit, and final $y(t)\in\mathbb{R}^D$ 8 projection. This structure enables nonlinear and cross-channel mixing at low computational cost (Shi et al., 2024).

4. Computational Complexity and Efficiency

Unlike the quadratic complexity of transformer self-attention ( $y(t)\in\mathbb{R}^D$ 9 for an $h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 0 image), OSS’s total complexity is linear in the number of patches: $h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 1 with $h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 2 and $h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 3 constant (6 for VmambaIR, 8 for OCTOPUS). All components (scan, gating, O-Attention) scale as $h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 4 (Shi et al., 2024, Mahatha et al., 31 Jan 2026). Empirically, OSS in VmambaIR reported only a $h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 5 FLOP increase over single-direction SSM while substantially expanding the model’s 2D and channel context (Shi et al., 2024).

5. Architectural Integration and Practical Deployment

OSS blocks are modular and readily integrated into hierarchical architectures. In VmambaIR, a four-stage U-Net variant is used:

Encoder: sequential OSS blocks at progressively reduced spatial resolutions.
Decoder: upsampling, additional OSS blocks, and skip concatenations.
Refinement: multiple OSS blocks at full resolution followed by a pixel-shuffle or convolutional output module, depending on the task (Shi et al., 2024).

In OCTOPUS, OSS is the foundational layer for vision SSMs, replacing standard raster-scan or unidirectional recurrence with true multi-directional propagation. Traversal selection is key to adaptively fusing the multi-orientation outputs at each pixel (Mahatha et al., 31 Jan 2026).

6. Empirical Performance and Analysis

OSS enables state-of-the-art results in both image restoration and semantic segmentation:

VmambaIR achieves 29.99 dB (Urban100, 4× SR), outperforming BebyGAN (29.19 dB), with LPIPS 0.0496 vs 0.0529, and demonstrates significant efficiency gains: 27.06 dB (NTIRE2020, 4× real SR) using 10.5 M parameters and 20.5 G FLOPs, compared to MM-RealSR’s 25.19 dB/26.13 M/78.6 G (Shi et al., 2024).
On Rain100H deraining, VmambaIR attains 31.66 dB/0.909 SSIM, exceeding Restormer’s 31.46 dB/0.904 with lower computational cost (Shi et al., 2024).
Ablations confirm the importance of both planar and channel-wise OSS scanning; removing planar or channel scanning reduces PSNR by 0.43 dB and 0.14 dB, respectively (Shi et al., 2024).
OCTOPUS demonstrates substantial improvements on segmentation (ADE20K single-scale mIoU: 37.93% for Octopus-T vs 22.77% for VMamba-T), cleaner object boundaries, and improved region consistency. Classification accuracy on miniImageNet also increases compared to previous vision SSMs (Octopus-T: 86.60% Top-1 vs 85.82% for VMamba-T) (Mahatha et al., 31 Jan 2026).

An analysis of the effective receptive field in OCTOPUS indicates the emergence of isotropic, eight-spoked coverage, superior to the window-based localities of Swin transformer and anisotropy of VMamba, reflecting OSS’s enhancement of 2D spatial awareness (Mahatha et al., 31 Jan 2026).

7. Significance and Perspectives

By overcoming the causality and locality constraints of standard SSMs, OSS establishes a path for scalable, spatially-aware, and efficient vision architectures. Its ability to tightly couple global context modeling and local spatial coherence, while maintaining strict linear complexity and plug-and-play architectural integration, positions OSS as a foundational operator for next-generation visual SSMs. The demonstrated empirical gains in restoration and segmentation, together with interpretability through effective receptive field analyses, underscore OSS’s impact in both theoretical modeling and practical system performance (Shi et al., 2024, Mahatha et al., 31 Jan 2026).

Aspect	VmambaIR (6 directions)	OCTOPUS (8 directions)
Spatial scan directions	H/W ±, Channels ±	All axes ±, diagonals ±
Fusion mechanism	Addition and projection	Traversal selection (O-Attention)
Core SSM type	Mamba	Mamba
Complexity per pass	$h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 6	$h_t = \bar{A} h_{t-1} + \bar{B} x_t,\quad y_t = C h_t + D x_t,$ 7
Empirical improvement	SR/Derain SOTA, efficient	Segmentation/classification boost

Markdown Report Issue Upgrade to Chat

References (2)

VmambaIR: Visual State Space Model for Image Restoration (2024)

OCTOPUS: Enhancing the Spatial-Awareness of Vision SSMs with Multi-Dimensional Scans and Traversal Selection (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Omni Selective Scan (OSS).

Omni Selective Scan (OSS) for Vision SSMs

1. Foundations: State Space Models and Visual Sequence Modeling

2. Multi-Directional and Omni-Directional Feature Propagation

3. OSS Block Structure, Traversal Selection, and Fusion

4. Computational Complexity and Efficiency

5. Architectural Integration and Practical Deployment

6. Empirical Performance and Analysis

7. Significance and Perspectives

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Omni Selective Scan (OSS) for Vision SSMs

1. Foundations: State Space Models and Visual Sequence Modeling

2. Multi-Directional and Omni-Directional Feature Propagation

3. OSS Block Structure, Traversal Selection, and Fusion

4. Computational Complexity and Efficiency

5. Architectural Integration and Practical Deployment

6. Empirical Performance and Analysis

7. Significance and Perspectives

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research