ChainedMamba: Refined Bidirectional State-Space Model

Updated 12 November 2025

ChainedMamba is a refined bidirectional state-space model that chains forward S6 scan outputs into a backward scan to capture high-level geometric features.
It improves geometric context aggregation in point cloud analysis without extra parameters by leveraging a chained processing paradigm.
Empirical results on ModelNet40 demonstrate an accuracy boost from 92.69% to 93.65%, underscoring its practical impact on feature learning.

ChainedMamba is a parameter-free architectural refinement of the bidirectional Mamba module for selective state-space modeling, designed to enhance high-level geometric context aggregation in point cloud analysis through a chained forward-backward processing paradigm. It is introduced within the CloudMamba framework and leverages the outputs of a forward S6 (selective state-space) scan as the input for a subsequent backward scan, thereby enabling richer geometric feature learning while maintaining linear computational complexity.

1. Motivations and Relation to Standard/Parallel Mamba

The Mamba architecture builds upon the S6 selective state-space layer, a causal scanning operator wherein each time step $t$ receives as input information only from preceding steps $\{1, \ldots, t\}$ , but not from any future steps $\{t+1, \ldots, L\}$ . This unidirectional property poses challenges for data such as point clouds or images which demand access to global, bidirectional context due to the lack of inherent ordering in such modalities.

A parallel bidirectional Mamba implements two independent S6 scans: a forward S6 in the original sequence order, outputting $\{\hat t_1^{\rightarrow}, \ldots, \hat t_L^{\rightarrow}\}$ , and a backward S6 on the reversed sequence, yielding $\{\hat t_L^{\leftarrow}, \ldots, \hat t_1^{\leftarrow}\}$ . The resulting features at each position are fused (e.g., concatenated or summed). However, in this arrangement, the backward S6 only processes the raw input sequence and cannot leverage hierarchical structure learned during the forward scan.

ChainedMamba refines the above by "chaining" the process: the output features of the forward S6 scan are supplied as the input for the backward S6 scan. Thus, the backward pass at position $i$ is conditioned not only on primitive input data but also on high-order, forward-aggregated semantics from points $\{1, \ldots, i\}$ . On ModelNet40, this transition from parallel to chained structure increases overall accuracy from 92.69% to 93.65% (Tab. 6).

2. Architecture and Workflow

ChainedMamba operates within each axis of the hexa-orientation block of CloudMamba, processing three separately sorted causal sequences (X-, Y-, and Z-sorted). For a single axis $A$ , the workflow is as follows:

Input Preparation: Construct a causal sequence $T_A$ of length $L+1$ , appending a prompt token $t_A^{\text{Prom}}$ and positional encoding $E_A^{\text{pos}}$ to the sequence of embedded point features $t_{A_j} + \rho(c_{A_j})$ .
Forward S6 Scan: Apply the grouped selective state-space model (GS6) to $T_A$ , generating output $\overrightarrow{O}_A = [\hat t_{A_1}, \ldots, \hat t_{A_L}]$ .
Reversal: Reverse $\overrightarrow{O}_A$ to obtain the backward scan input $\overleftarrow{I}_A = [\hat t_{A_L}, \ldots, \hat t_{A_1}]$ .
Backward S6 Scan: Process $\overleftarrow{I}_A$ with GS6 in the backward direction, resulting in $\overleftarrow{O}_A = [\tilde t_{A_L}, \ldots, \tilde t_{A_1}]$ .
Order Restoration and Output: Reverse $\overleftarrow{O}_A$ to align with the original point order, producing the output $\overline T_A = [\tilde t_{A_1}, \ldots, \tilde t_{A_L}]$ , discarding the prompt.

The forward and backward GS6 modules share grouping and parameter-sharing hyperparameters but maintain separate state memory.

3. Mathematical Formulation

Given a causal sequence $x_1, \ldots, x_L$ (including the prompt), the forward S6 or GS6 outputs are computed as

$h_t^f = \bar{A}_t^f h_{t-1}^f + \bar{B}_t^f x_t, \qquad y_t^f = C_t^f h_t^f,$

where the parameters $\{\bar A_t^f, \bar B_t^f, C_t^f\}$ are functions of the current input $x_t$ .

For the backward chained scan,

$\tilde{x}_t = y_{L-t+1}^f,$

$h_t^b = \bar{A}_t^b h_{t-1}^b + \bar{B}_t^b \tilde{x}_t, \qquad y_t^b = C_t^b h_t^b,$

with the output sequence reversed such that the final feature at position $t'$ is $\overline{y}_{t'} = y_t^b$ , where $t' = L-t+1$ .

No additional loss functions or regularization terms are introduced; ChainedMamba inherits the loss defined for the task (e.g., cross-entropy).

4. Implementation and Pseudocode

A high-level pseudocode (omitting GS6 internal structure) for ChainedMamba on a single axis is:

Seq[0] = P + Epos
for i in 1..L:
    Seq[i] = T[i] + rho(coord[i])

FwdOut = GS6_forward(Seq)  # length L+1

FwdData = FwdOut[1..L]

RevData = reverse(FwdData)

BwdRevOut = GS6_forward(RevData)

for i in 1..L:
    Out[i] = BwdRevOut[L - i + 1]
return Out

Each axis is processed independently, followed by axis-wise fusion. The forward and backward GS6 passes are independent except for parameter sharing configuration.

5. High-Level Geometric Feature Aggregation

Unlike parallel bidirectional Mamba, in which the backward scan can only reconstruct low-level geometric relationships (e.g., nearest-neighbor distances), ChainedMamba enables the backward scan to process forward-aggregated features that already encode high-order geometric structure. As a result, features produced after the chained backward scan encapsulate both directional and global information, such as intricate object-level shapes and curved surface patterns. Empirical results on ModelNet40 demonstrate that ChainedMamba achieves better high-level geometry perception: parallel bidirectional Mamba attains 92.69% overall accuracy, while ChainedMamba increases this to 93.65% (Tab. 6).

6. Computational Complexity and Practical Considerations

ChainedMamba introduces no change in asymptotic runtime relative to parallel bidirectional Mamba. Both the forward and backward GS6 scans operate in $O(L)$ time for sequences of length $L$ and scale linearly with the number of point features. Therefore, the overall block complexity is $2 \cdot O(L) = O(L)$ , with linear empirical FLOPs and wall-clock scaling (Tab. 1–3; Appendix G, Fig. 4). ChainedMamba requires no additional learnable parameters or auxiliary losses, and can be incorporated wherever a bidirectional Mamba block might otherwise be applied.

7. Summary and Significance

ChainedMamba provides a direct, computationally efficient refinement to bidirectional selective state-space modeling. By feeding forward-pass features into the backward pass, the approach enables significantly better geometric modeling in unordered point sets, as evidenced by measurable accuracy gains without increased computational demand. This design principle may generalize to other modalities where hierarchical and bidirectional context are simultaneously required.

Markdown Report Issue Upgrade to Chat

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ChainedMamba.