2000 character limit reached

Dual Mamba-enhanced GCN

Updated 17 November 2025

The paper introduces DMbaGCN, which integrates dual Mamba modules (LSEMba and GCAMba) to counteract over-smoothing by dynamically modulating node-specific and global information.
Its methodology leverages adaptive state-space modeling and bidirectional recurrence to fuse multi-layer node trajectories with global context, ensuring robust feature preservation.
Empirical results show DMbaGCN outperforms standard GCNs with high accuracy (up to 95.61% on Amazon Photo) and stable performance across deep layers.

The Dual Mamba-enhanced Graph Convolutional Network (DMbaGCN) is a graph learning architecture designed to address the long-standing over-smoothing problem in deep Graph Neural Networks (GNNs). Over-smoothing refers to the rapid homogenization of node representations as network depth increases, which severely degrades node discriminability for tasks such as node classification. DMbaGCN augments the conventional Graph Convolutional Network (GCN) scheme by explicitly modeling node-specific progressive state evolution across layers and by incorporating global contextual information through a dual instantiation of Mamba selective state-space modules.

1. Conceptual Framework and Architectural Innovation

Standard GCNs (e.g., Kipf & Welling 2016) propagate node features layer by layer via fixed linear propagation steps, leading to information averaging and eventual collapse of feature distinctions: $X^{(l)} \leftarrow D^{-\frac12} A D^{-\frac12} X^{(l-1)}$ This protocol lacks adaptive gating for node state evolution and omits explicit modeling of global graph context.

DMbaGCN introduces two synergistic modules:

Local State-Evolution Mamba (LSEMba): Applies a selective state-space model (built on the Mamba paradigm) over each node’s aggregated feature trajectory across network depth, enabling dynamic, node-specific control over information retention and update.
Global Context-Aware Mamba (GCAMba): Employs a bidirectional Mamba recurrence mechanism across the entire set of initial node embeddings, yielding adaptive global context vectors infused into each node’s representation.

The final node embedding is a weighted fusion of local and global outputs: $Z = \alpha Y^F_L + (1-\alpha) \hat{Y}^G$ where $Y^F_L$ is the per-node LSEMba output and $\hat{Y}^G$ is the GCAMba context-aware output.

2. Detailed Module Mechanics

2.1 Local State-Evolution Mamba (LSEMba)

Neighborhood Aggregation: Follows multi-hop GCN aggregation for $L$ layers:

$X^{(l)} = D^{-\frac12} A D^{-\frac12} X^{(l-1)}$

Feature Trajectory Construction: For node $v_i$ , form depth-ordered sequence:

$S_i = [x_i^{(0)}, x_i^{(1)}, ..., x_i^{(L)}]$

State-Space Parameterization: Initialize data-dependent transition matrices via small MLPs:

$Q^F = f_{\theta_1}(S), \quad R^F = f_{\theta_2}(S), \quad \Delta^F = f_{\theta_3}(S)$

Discretization: Convert continuous SSM parameters using HiPPO initialization:

$\bar{P}^F = \exp(\Delta^F P^F)$

$\bar{Q}^F = (\Delta^F P^F)^{-1}(\exp(\Delta^F P^F) - I)\Delta^F Q^F$

Selective Recurrence: Recursively update per-hop hidden states:

$h_t^F = \bar{P}_t^F h_{t-1}^F + \bar{Q}_t^F x_i^{(t)}$

$y_{i, t}^F = R_t^F h_t^F$

The final LSEMba output for node $i$ is $Y^F_i = y_{i,L}^F$ .

2.2 Global Context-Aware Mamba (GCAMba)

Global Sequence Construction: Concatenate all initial node features:

$F = [x_1^{(0)}, x_2^{(0)}, ..., x_N^{(0)}]$

Parameter Initialization: Use small networks $f_{\phi_1},f_{\phi_2},f_{\phi_3}$ for (Q, R, Δ).
Bidirectional Recurrence: Both forward and reversed passes allow each node to integrate information from all other nodes:

$h_t^G = \sum_{j=1}^t \left( \prod_{k=j+1}^t \bar{P}_k^G \right) \bar{Q}_j^G x_j^{(0)}$

$y_t^G = R_t^G h_t^G$

Context Fusion: Combine outputs forward, backward, and with a residual term:

$\hat{Y}^G = (1-\beta) [f_{\phi}(F) + Re(f_\phi(Re(F)))] + \beta X^{(0)}$

where $Re(\cdot)$ denotes sequence reversal, and $\beta \in [0, 1]$ is a tunable coefficient.

3. End-to-End Algorithm and Pseudocode

The DMbaGCN workflow consists of the following computational steps:

Compute multi-hop feature matrices $X^{(1)},...,X^{(L)}$ .
For each node, build sequence $S_i$ ; initialize LSEMba’s $Q^F, R^F, \Delta^F$ .
Discretize state-space parameters ( $\bar{P}^F, \bar{Q}^F$ ).
Run LSEMba recurrence for $Y^F_L$ .
Create global feature sequence $F$ ; initialize GCAMba's $Q^G, R^G, \Delta^G$ ; discretize accordingly.
Run bidirectional Mamba recurrence for GCAMba; fuse for $\hat{Y}^G$ .
Fuse $Y^F_L$ and $\hat{Y}^G$ into final node outputs with mixing parameter $\alpha$ .
Apply supervised loss and update parameters via backpropagation.

This workflow enables simultaneous modeling of progressive, node-specific dynamics and holistic graph-wide interactions in a scalable and end-to-end differentiable framework.

4. Theoretical Justification for Over-Smoothing Mitigation

Conventional deep GCNs suffer from degeneration of feature diversity: $\lim_{L \rightarrow \infty} \| h_i^{(L)} - h_j^{(L)} \| \rightarrow 0$ In DMbaGCN:

Adaptive Node Gating (LSEMba): State-space parameters $\bar{P}_t^F, \bar{Q}_t^F$ depend on each node’s full feature trajectory, enabling per-node gating and memory.
Global Recurrence (GCAMba): Bidirectional SSM ensures that node representation incorporates distinguishing global features, preventing collapse into indistinguishable embeddings.
Feature Variance Preservation: The β-weighted residual preserves initial node feature variance.
Informal Proposition: Under mild conditions, DMbaGCN achieves lower bounds on node discriminability:

$\forall i \neq j, \quad \| Z_i - Z_j \| \geq \delta > 0$

The input-dependent recurrence and global attention-like expansion mean the network’s Jacobian avoids the rank collapse characteristic of typical GCNs.

5. Computational and Training Characteristics

Recommended hyperparameter settings:

Parameter	Typical Range	Notes
Depth ( $L$ )	16–32	Deep stacking enabled by anti-smoothing
Hidden dimension ( $d_h$ )	64	Per paper; scalable to higher dimensions
Mixing ( $\alpha,\beta$ )	0.1–0.9	Dataset-dependent, grid search
Optimization	Adam, lr=1e-3	Weight decay=5e-4, early stop via val loss
Data Split	60/20/20 (train/val/test)	10 random splits for robust assessment

Performance and resource characteristics:

Complexity: Overall per-iteration cost is $O(M + N L)$ , where $M$ is the number of edges and $N$ the number of nodes. No quadratic $O(N^2)$ attention bottlenecks.
Memory Use: ≈ $10^3$ MB per epoch, which is substantially less than Transformer-based baselines ( $10^4$ MB).
Runtime: Forward/backward pass ≈ $10^2$ ms per epoch (NVIDIA L40 GPU benchmark), versus Transformer baseline ( $10^3$ ms).

6. Empirical Performance and Benchmark Results

Experiments utilize six node-classification datasets spanning citation networks, Amazon co-purchase graphs, and co-authorship graphs. Baseline methods include shallow GNNs (GCN, GAT, SGC), deep GNNs (APPNP, GCNII, GPRGNN, SSGC), Transformer-based graph models, and Mamba-based GCN (MbaGCN).

Key findings:

Test Accuracy: DMbaGCN achieves a 32-layer test accuracy of 95.61% on Amazon Photo (vs. GCN’s 24.03%, SGC’s 79.19%, GCNII’s 92.46%) and 90.44% on Pubmed (vs. GCN’s 45.69%, GPRGNN’s 89.45%). Improvements over single-stream MbaGCN: +1.17% (Pubmed), +1.20% (Photo), +0.77% (Physics).
Depth Robustness: Accuracy remains within ±0.5% from 2 to 32 layers—contrasting sharply with the over-smoothing degradation of vanilla GCN and even deep static models.
Efficiency: Comparable runtime and memory to simple GNNs; significantly more efficient than Transformer-style baselines.

7. Comparative Analysis: MbaGCN Backbone and Dual Mamba Enhancement

The original MbaGCN (as formulated in (He et al., 26 Jan 2025)) integrates Mamba modules into GNNs via:

Message Aggregation Layer (MAL): Standard neighborhood aggregation.
Selective State Space Transition Layer (S3TL): Mamba SSM integration for data-dependent adaptive transitions.
Node State Prediction Layer (NSPL): Discrete flow control via Gumbel-Softmax gated neighbor selection.

DMbaGCN duplicates the S3TL (“Dual” Mamba), instantiates separate streams for different structural ranges (e.g., 1-hop vs. 2-hop adjacency), and fuses their outputs with a learned gating mechanism: $G^{(l)} = \sigma\left([Y_1^{(l)} \| Y_2^{(l)}] W_g + b_g\right)$

$H^{(l)} = G^{(l)} \odot Y_1^{(l)} + (1-G^{(l)}) \odot Y_2^{(l)}$

This dual-stream approach, in conjunction with an explicit over-smoothing regularizer,

$\mathcal{L}_{\rm reg} = -\frac{1}{|\mathcal{E}|} \sum_{(i,j)\in\mathcal{E}} \| H_i^{(L)} - H_j^{(L)} \|^2$

achieves improved depth-dependent accuracy and more gradual performance degradation for large $L$ .

8. Prospects and Potential Extensions

DMbaGCN is end-to-end differentiable and amenable to scalable implementation in frameworks such as PyTorch and Deep Graph Library. A plausible implication is that further architectural extensions—such as multi-stream Mamba, dynamic hop-order selection, or refined regularization penalties—may yield further improvement. The dual Mamba approach is directly applicable to node classification and, due to its efficient global context incorporation, can support inductive learning, transfer across large graphs, and potentially even extension to edge or subgraph-level tasks.

In summary, DMbaGCN leverages node-specific selective state-space modeling and linear-time global attention to deliver deep GNNs that maintain node discriminability across substantial depth and large graph size, with empirically validated efficiency and superiority over both classical and Transformer-based graph models (He et al., 10 Nov 2025, He et al., 26 Jan 2025).

PDF Markdown Chat (Pro)

References (2)

Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space (2025)

Dual Mamba for Node-Specific Representation Learning: Tackling Over-Smoothing with Selective State Space Modeling (2025)

Follow Topic

Get notified by email when new papers are published related to Dual Mamba-enhanced Graph Convolutional Network (DMbaGCN).