GMANet: Scalable Graph Mamba Network

Updated 25 December 2025

GMANet is a state space model-based architecture that leverages hierarchical tokenization and bidirectional SSM layers for efficient graph learning.
It overcomes message-passing limitations and quadratic complexity by using localized subgraph encoding and selective state-space recurrence.
Empirical benchmarks demonstrate GMANet's competitive accuracy and resource efficiency on tasks like node classification in COCO-SP and OGBN-Arxiv.

GMANet refers to the “Graph Mamba Network,” a state space model-based architecture for scalable and expressive learning on graphs, as introduced in "Graph Mamba: Towards Learning on Graphs with State Space Models" (Behrouz et al., 13 Feb 2024). GMANet generalizes selective state-space sequence modeling (notably the Mamba SSM) to graph-structured data via a combination of hierarchical neighborhood tokenization, bidirectional sequence scanning, and local subgraph encoding. It is designed to circumvent key bottlenecks of message passing GNNs (over-squashing, limited long-range modeling) and Graph Transformers (quadratic complexity, PE/SE reliance), enabling efficient, permutation-equivariant, and highly expressive graph learning.

1. Selective State-Space Model Foundations

GMANet is built upon the structured state space model (S4, Mamba) paradigm, operating over token sequences rather than full graph adjacency matrices. At each layer, the input is a sequence $\Phi = (\phi_1, \dots, \phi_K) \in \mathbb{R}^{K \times d}$ . This sequence is linearly transformed via a small 1D convolution and nonlinearity to yield a context-dependent state-input matrix. Input-dependent parameters $(\bar{A}_t, \bar{B}_t, C_t)$ define a time-varying discrete SSM recurrence: $h_t = \bar{A}_t h_{t-1} + \bar{B}_t \phi_t,\quad y_t = C_t h_t$ with a closed-form convolutional representation. All SSM kernels are locally conditioned on token content, permitting dynamic selection or ignoring of context per token (Behrouz et al., 13 Feb 2024).

2. Hierarchical Tokenization and Ordering Strategies

To bridge the gap between sequence SSMs and unordered graphs, GMANet leverages hierarchical neighborhood tokenization. For each node $v$ :

$s$ repetitions are drawn of $m$ -hop local subgraphs, assembled by random walks of length $\hat{m}$ (for $\hat{m}=0,..,m$ ).
Each sampled token $\tau_{\hat{m}}^j(v)$ corresponds to the induced subgraph of randomly reached nodes at hop $\hat{m}$ .
Tokens for each node are ordered with largest-hop (most global) subgraphs first, recursively down to the singleton node (hop $0$), with random shuffling within hops to ensure permutation equivariance.

This construction yields a node-specific ordered token sequence of length $K = s(m+1)$ . For $m=0$ , global node token sequences are constructed by degree/centrality sorting.

3. Bidirectional Selective SSM Layers

A single GMANet layer performs two SSM sequence scans: forward and backward, each parameterized by distinct weights. After processing, outputs from the backward scan are reversed and added to the forward outputs, followed by a final linear mapping. This guarantees equivariance and symmetry with respect to token ordering, while permitting deep hierarchical selection across tokens. Multiple such layers (typically $L \sim 2-6$ ) are stacked, with per-token linear layers and normalization between them.

4. Local Subgraph Encoding and Positional Encoding

Each tokenized subgraph $\tau$ is embedded via a lightweight encoder $\phi$ :

Either a localized MPNN (e.g., Gated-GCN) or a feature summarization (e.g., random walk features à la RWF) is applied over the token subgraph.
Optionally, structural position encodings (first $p$ Laplacian eigenvectors or high-order random-walk structural encodings) are concatenated to node features when additional structure is required.
In practice, subgraph encoding provides sufficient inductive bias and structural awareness, often obviating the need for external SE/PE (Behrouz et al., 13 Feb 2024).

5. Expressivity, Universality, and Theoretical Properties

GMANet exhibits several notable theoretical properties:

Universality: For any continuous, permutation-equivariant $f:[0,1]^{n\times d} \to \mathbb{R}^{n\times d}$ and $\epsilon > 0$ , a GMANet with proper parameter and PE choices achieves $\|f-g\|_p < \epsilon$ .
Beyond Weisfeiler-Lehman (WL) Limits: With full Laplacian PE, GMANet separates any pair of non-isomorphic graphs, exceeding $k$ -WL for all $k$ .
Unbounded expressivity: Even in the absence of PE/SE, the use of sufficiently diverse and long random-walk subgraph tokens allows GMANet to distinguish graph pairs beyond the $k$ -WL hierarchy (Behrouz et al., 13 Feb 2024).
These properties are formalized in Propositions 1–3 in the source work.

6. Computational Complexity and Empirical Performance

GMANet achieves linear-time complexity:

Neighborhood tokenization and local encoding: $O(n s M m)$ for $n$ nodes, $s$ samples, subgraph size $M$ , and $m$ hops. Local encoders are run on relatively small subgraphs per token.
Bidirectional SSM sequence scanning: $O(L s m)$ per node per layer, for $L$ bidirectional SSM layers.
In contrast to $O(n^2)$ cost of dense graph transformer attention (GPS, Exphormer, etc.), GMANet is linear in $n$ and $m$ .

Empirical benchmarks demonstrate that GMANet (with and without PE/SE):

Matches or outperforms leading Graph Transformers and MPNNs on all major benchmarks (LRGB, GNN Benchmark, Heterophily Suite, OGBN-Arxiv).
Uses 2–5× less GPU memory and is 3–10× faster per epoch than full-attention Graph Transformers.
Example: On COCO-SP (Open NeurIPS '22, node classification), GMANet achieves 0.397 F1 vs. GPS 0.377, Exphormer 0.343. On OGBN-Arxiv, GMANet attains 72.48% accuracy with only 3.9GB per-epoch GPU memory (Behrouz et al., 13 Feb 2024).

7. Comparison with Preceding Architectures and Future Directions

GMANet’s use of selective state-space models introduces several principled differences:

Versus MPNNs: Avoids message-passing bottlenecks (“oversquashing”/“oversmoothing”), directly models long-range dependencies, and admits much deeper stacks.
Versus Graph Transformers: Dispenses with quadratic attention, complex sparse attention engineering, and heavy reliance on positional/structural encoding, while also providing built-in local bias via subgraph tokenization.
Potential future directions include further scaling via sparse/clustered SSMs, more sophisticated subgraph selection, and applying SSM-based architectures to temporal and multi-relational graphs.

GMANet constitutes a distinct and theoretically principled solution among state-of-the-art graph learning architectures, combining the scalability and flexibility of tokenized SSM scan architectures with the structure-awareness and expressivity traditionally sought in graph neural network design (Behrouz et al., 13 Feb 2024).

PDF Markdown Chat (Pro)

References (1)

Graph Mamba: Towards Learning on Graphs with State Space Models (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to GMANet.