Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cross State-Space Propagation (CSSP)

Updated 3 July 2026
  • CSSP is a mechanism that propagates features across video frames using state-space modeling to enhance alignment in medical video super-resolution.
  • It uses neighbor-driven updates and distant-driven observations through learnable convolutional modules to filter out misaligned or artifact-prone inputs.
  • Empirical studies show that CSSP improves temporal consistency and PSNR, demonstrating robustness against camera shake and tissue deformation.

Cross State-Space Propagation (CSSP) is a recurrent feature propagation mechanism central to the MedVSR framework for medical video super-resolution. CSSP addresses the alignment challenges in low-resolution medical videos—namely, camera shake, noise, abrupt frame transitions, and the nuanced, continuous structures of tissue—by embedding information from both neighboring and distant frames into learnable state-space dynamics. CSSP achieves this by projecting distant frames into the observation matrices of a state-space model (SSM), allowing consistent and informative features to propagate recurrently while filtering out misaligned or artifact-prone content (Liu et al., 25 Sep 2025).

1. Fundamental Principles of Cross State-Space Propagation

CSSP leverages a linear, discrete-time state-space model structurally akin to the Kalman filter. Let hi∈RNh_i\in\mathbb{R}^N denote the hidden state at step ii, xi∈Rdx_i\in\mathbb{R}^d the input token, and yi∈Rmy_i\in\mathbb{R}^m the output token. The standard SSM equations are given by: hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i where A∈RN×NA \in \mathbb{R}^{N \times N} is the state transition matrix, B∈RN×dB \in \mathbb{R}^{N \times d} the input matrix, and C∈Rm×NC \in \mathbb{R}^{m \times N} the observation matrix.

Unlike classical state-space approaches, CSSP parameterizes AA (as diagonal or block-diagonal, fixed or learned), but crucially makes BB and ii0 input-dependent. Specifically, ii1 is modulated by neighbor-frame features and ii2 dynamically incorporates warped distant-frame features.

2. CSSP Mechanism: Cross-Frame Feature Projection

At each temporal step ii3 and propagation branch ii4, CSSP operates as follows:

  • Feature extraction: ii5 from frame ii6; ii7 from frame ii8 warped toward ii9 via composite flow.
  • Tokenization: Both feature maps are partitioned into non-overlapping xi∈Rdx_i\in\mathbb{R}^d0 windows, forming token sequences xi∈Rdx_i\in\mathbb{R}^d1 and xi∈Rdx_i\in\mathbb{R}^d2.
  • Parameter computation: Through 1D convolutions with LayerNorm and gating, CSSP yields:
    • Inputs and state update (xi∈Rdx_i\in\mathbb{R}^d3, xi∈Rdx_i\in\mathbb{R}^d4) from xi∈Rdx_i\in\mathbb{R}^d5
    • Observation xi∈Rdx_i\in\mathbb{R}^d6 from xi∈Rdx_i\in\mathbb{R}^d7 via a learnable position embedding (LPE).

The SSM is unrolled over the tokens, with the hidden state propagated by neighbor-frame features and the output projected through matrices modulated by distant-frame features. After per-token gating, concatenation, and MLP-based residual lifting, a deformable convolution fuses xi∈Rdx_i\in\mathbb{R}^d8, the propagated features, and xi∈Rdx_i\in\mathbb{R}^d9 for optimal alignment.

3. Cross-Frame Coupling and Robustness

The cross aspect of CSSP is encapsulated in its design: neighbor-frame tokens govern the state-update, while distance-warped tokens control the observation, achieving a form of cross-frame information integration within the SSM recurrence. Even when optical flow between non-consecutive frames (yi∈Rmy_i\in\mathbb{R}^m0 to yi∈Rmy_i\in\mathbb{R}^m1) is compromised, the architecture bypasses direct warping in favor of a composite warp towards the nearest frame, thereby increasing robustness to scene discontinuities and flow estimation errors. The token-wise gating mechanism further suppresses the influence of misaligned regions by downweighting inconsistent outputs.

Ablation studies demonstrate that this cross-coupling confers a PSNR improvement of approximately 0.3 dB over single-frame SSM propagation and that omitting distant-frame control leads to a degradation of 0.32 dB, establishing the centrality of cross-frame dynamics in effective temporal alignment (Liu et al., 25 Sep 2025).

4. Algorithmic Workflow and Pseudocode

The following workflow summarizes CSSP for a single time-step and branch:

  1. Inputs: yi∈Rmy_i\in\mathbb{R}^m2, optical flow yi∈Rmy_i\in\mathbb{R}^m3
  2. Warp: Compute yi∈Rmy_i\in\mathbb{R}^m4 via composite warp
  3. Tokenize: Obtain yi∈Rmy_i\in\mathbb{R}^m5 and yi∈Rmy_i\in\mathbb{R}^m6 using local windowing
  4. Convolutional processing: Compute yi∈Rmy_i\in\mathbb{R}^m7 via gated Conv1d and LayerNorm
  5. SSM propagation: For each token yi∈Rmy_i\in\mathbb{R}^m8 in yi∈Rmy_i\in\mathbb{R}^m9
    • hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i0
    • hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i1
  6. Gating and aggregation: Apply token-wise gating hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i2 and output projection
  7. Residual enhancement: Combine with MLP and residual connection
  8. Alignment: Fuse with hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i3 via deformable convolution

Pseudocode (as provided):

A∈RN×NA \in \mathbb{R}^{N \times N}0

5. Empirical Performance and Training Regimen

MedVSR with CSSP demonstrates significant improvements over BasicVSR++ on medical video datasets (HyperKvasir, LDPolyp, EndoVis18), achieving up to 0.37 dB higher PSNR while using fewer parameters. Qualitative assessments indicate that artifacts from misaligned distant frames are effectively suppressed. The pipeline employs hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i4 bidirectional CSSP branches, tokenization with local windows (hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i5), SSM hidden dimension hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i6, and learnable 2D depthwise Conv-based position embeddings. Training uses the Charbonnier loss, cosine learning rate decay, hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i7 HR patches, Gaussian noise with hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i8, and bicubic hi=Ahi−1+Bxi,yi=Chih_i = A h_{i-1} + B x_i,\quad y_i = C h_i9 downsampling. SpyNet optical flow is employed for the composite warp. Inner State-Space Reconstruction (ISSR) and large-kernel separable convolutions enhance final reconstruction (Liu et al., 25 Sep 2025).

6. Architectural Choices and Practical Implications

CSSP achieves effective temporal alignment and information selection in challenging video conditions. Its architectural innovations—a small recurrent SSM with neighbor-driven updates and distant-driven observations, input-dependent parameterization via convolutional features, and robust gating—directly address instability in alignment when classical optical flow is unreliable due to camera shake or tissue deformation. The method is readily extensible to other video analysis domains where temporal consistency and artifact rejection are critical, particularly when imaging scenes with repetitive or ambiguous textures.

7. Context and Impact in Medical Video Super-Resolution

By integrating cross-frame feature propagation into the state-space modeling framework, CSSP advances the fidelity and reliability of medical video super-resolution, a domain where stringent alignment is necessary to avoid diagnostic artifacts. The cross-recurrence mechanism ensures that only consistent features propagate, reducing the risk of reconstructing misleading content in frames with challenging optical flow. The demonstrated improvements in PSNR and qualitative artifact rejection situate CSSP as a robust solution for real-world low-resolution medical video enhancement and set a precedent for future state-space recurrent modeling in video restoration tasks (Liu et al., 25 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross State-Space Propagation (CSSP).