ChainedMamba: Refined Bidirectional State-Space Model
- ChainedMamba is a refined bidirectional state-space model that chains forward S6 scan outputs into a backward scan to capture high-level geometric features.
- It improves geometric context aggregation in point cloud analysis without extra parameters by leveraging a chained processing paradigm.
- Empirical results on ModelNet40 demonstrate an accuracy boost from 92.69% to 93.65%, underscoring its practical impact on feature learning.
ChainedMamba is a parameter-free architectural refinement of the bidirectional Mamba module for selective state-space modeling, designed to enhance high-level geometric context aggregation in point cloud analysis through a chained forward-backward processing paradigm. It is introduced within the CloudMamba framework and leverages the outputs of a forward S6 (selective state-space) scan as the input for a subsequent backward scan, thereby enabling richer geometric feature learning while maintaining linear computational complexity.
1. Motivations and Relation to Standard/Parallel Mamba
The Mamba architecture builds upon the S6 selective state-space layer, a causal scanning operator wherein each time step receives as input information only from preceding steps , but not from any future steps . This unidirectional property poses challenges for data such as point clouds or images which demand access to global, bidirectional context due to the lack of inherent ordering in such modalities.
A parallel bidirectional Mamba implements two independent S6 scans: a forward S6 in the original sequence order, outputting , and a backward S6 on the reversed sequence, yielding . The resulting features at each position are fused (e.g., concatenated or summed). However, in this arrangement, the backward S6 only processes the raw input sequence and cannot leverage hierarchical structure learned during the forward scan.
ChainedMamba refines the above by "chaining" the process: the output features of the forward S6 scan are supplied as the input for the backward S6 scan. Thus, the backward pass at position is conditioned not only on primitive input data but also on high-order, forward-aggregated semantics from points . On ModelNet40, this transition from parallel to chained structure increases overall accuracy from 92.69% to 93.65% (Tab. 6).
2. Architecture and Workflow
ChainedMamba operates within each axis of the hexa-orientation block of CloudMamba, processing three separately sorted causal sequences (X-, Y-, and Z-sorted). For a single axis , the workflow is as follows:
- Input Preparation: Construct a causal sequence of length , appending a prompt token and positional encoding to the sequence of embedded point features .
- Forward S6 Scan: Apply the grouped selective state-space model (GS6) to , generating output .
- Reversal: Reverse to obtain the backward scan input .
- Backward S6 Scan: Process with GS6 in the backward direction, resulting in .
- Order Restoration and Output: Reverse to align with the original point order, producing the output , discarding the prompt.
The forward and backward GS6 modules share grouping and parameter-sharing hyperparameters but maintain separate state memory.
3. Mathematical Formulation
Given a causal sequence (including the prompt), the forward S6 or GS6 outputs are computed as
where the parameters are functions of the current input .
For the backward chained scan,
with the output sequence reversed such that the final feature at position is , where .
No additional loss functions or regularization terms are introduced; ChainedMamba inherits the loss defined for the task (e.g., cross-entropy).
4. Implementation and Pseudocode
A high-level pseudocode (omitting GS6 internal structure) for ChainedMamba on a single axis is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Seq[0] = P + Epos for i in 1..L: Seq[i] = T[i] + rho(coord[i]) FwdOut = GS6_forward(Seq) # length L+1 FwdData = FwdOut[1..L] RevData = reverse(FwdData) BwdRevOut = GS6_forward(RevData) for i in 1..L: Out[i] = BwdRevOut[L - i + 1] return Out |
Each axis is processed independently, followed by axis-wise fusion. The forward and backward GS6 passes are independent except for parameter sharing configuration.
5. High-Level Geometric Feature Aggregation
Unlike parallel bidirectional Mamba, in which the backward scan can only reconstruct low-level geometric relationships (e.g., nearest-neighbor distances), ChainedMamba enables the backward scan to process forward-aggregated features that already encode high-order geometric structure. As a result, features produced after the chained backward scan encapsulate both directional and global information, such as intricate object-level shapes and curved surface patterns. Empirical results on ModelNet40 demonstrate that ChainedMamba achieves better high-level geometry perception: parallel bidirectional Mamba attains 92.69% overall accuracy, while ChainedMamba increases this to 93.65% (Tab. 6).
6. Computational Complexity and Practical Considerations
ChainedMamba introduces no change in asymptotic runtime relative to parallel bidirectional Mamba. Both the forward and backward GS6 scans operate in time for sequences of length and scale linearly with the number of point features. Therefore, the overall block complexity is , with linear empirical FLOPs and wall-clock scaling (Tab. 1–3; Appendix G, Fig. 4). ChainedMamba requires no additional learnable parameters or auxiliary losses, and can be incorporated wherever a bidirectional Mamba block might otherwise be applied.
7. Summary and Significance
ChainedMamba provides a direct, computationally efficient refinement to bidirectional selective state-space modeling. By feeding forward-pass features into the backward pass, the approach enables significantly better geometric modeling in unordered point sets, as evidenced by measurable accuracy gains without increased computational demand. This design principle may generalize to other modalities where hierarchical and bidirectional context are simultaneously required.