Emergent Mind


Attention mechanisms have been widely used to capture long-range dependencies among nodes in Graph Transformers. Bottlenecked by the quadratic computational cost, attention mechanisms fail to scale in large graphs. Recent improvements in computational efficiency are mainly achieved by attention sparsification with random or heuristic-based graph subsampling, which falls short in data-dependent context reasoning. State space models (SSMs), such as Mamba, have gained prominence for their effectiveness and efficiency in modeling long-range dependencies in sequential data. However, adapting SSMs to non-sequential graph data presents a notable challenge. In this work, we introduce Graph-Mamba, the first attempt to enhance long-range context modeling in graph networks by integrating a Mamba block with the input-dependent node selection mechanism. Specifically, we formulate graph-centric node prioritization and permutation strategies to enhance context-aware reasoning, leading to a substantial improvement in predictive performance. Extensive experiments on ten benchmark datasets demonstrate that Graph-Mamba outperforms state-of-the-art methods in long-range graph prediction tasks, with a fraction of the computational cost in both FLOPs and GPU memory consumption. The code and models are publicly available at https://github.com/bowang-lab/Graph-Mamba.


  • Graph-Mamba introduces a Mamba block, a selective state space model (SSM), to efficiently model long-range dependencies in graph data, reducing computational costs.

  • The paper highlights the limitations of traditional Graph Neural Networks (GNNs) and Graph Transformers in handling long-range dependencies due to high computational demand.

  • Graph-Mamba's novel architecture outperforms current state-of-the-art methods in long-range graph prediction tasks while being more resource-efficient.

  • It offers theoretical and practical implications for improving long-range dependency modeling in large graphs, with potential applications in various fields such as social networks and molecular interactions.

Introduction to Graph-Mamba

Graph-Mamba addresses the issue of efficiently capturing long-range dependencies in graph data. Traditional Graph Neural Networks (GNNs) and Graph Transformers struggle with scalability for large graphs due to the quadratic computational cost of the attention mechanism. Graph-Mamba introduces an innovative approach by integrating a Mamba block, a selective state space model (SSM), to model long-range dependencies efficiently and with reduced computational cost.

Overview of Graph Neural Networks and Graph Transformers

Graph Neural Networks (GNNs) have been pivotal in handling graph-structured data, using message passing mechanisms to update node representations based on local neighborhood information. However, despite their successes, these methods often face challenges in capturing long-range dependencies within the graph. The introduction of Graph Transformers aimed to solve this issue by enabling global information exchange among all nodes, yet at the cost of increased computational demand.

The Advent of Graph-Mamba

Graph-Mamba proposes a novel architecture that leverages the strengths of selective state space models for graph data. This approach selectively filters nodes at each recurrence step, focusing on contextually relevant information and significantly enhancing long-range context modeling. The integration of the Mamba block within the Graph Transformer framework offers a powerful alternative to attention sparsification techniques, combining adaptive context selection with efficient linear-time computation.

Key Contributions and Technical Innovations

  • Graph-Centric Node Prioritization: Introduces a graph-centric approach to node prioritization, enhancing context-aware reasoning through selective attention to important nodes.

  • SSMs Adaptation for Non-Sequential Data: Extends the usability of SSMs to graph-structured data, maintaining efficient sequence modeling capabilities while addressing the inherent challenges of non-sequential inputs.

  • Performance and Efficiency: Demonstrates superior predictive performance on a range of benchmark datasets for long-range graph prediction tasks, alongside significant reductions in computational resources and memory consumption.

Experimental Validation

Extensive experiments on ten benchmark datasets reveal that Graph-Mamba outperforms existing state-of-the-art methods in long-range graph prediction tasks. Notably, it achieves considerably higher predictive performance with dramatically lower GPU memory usage and FLOPs, validating its efficiency and effectiveness in handling large graphs.

Theoretical and Practical Implications

Graph-Mamba's innovative approach holds substantial theoretical implications for the further development of GNN and graph Transformer models, specifically in optimizing for long-range dependency modeling. Practically, its efficiency and scalability make it an attractive choice for applications with large graph-structured data, such as social networks, molecular interactions, and brain connectivity analysis. Future developments may explore deeper integration of SSMs within graph modeling frameworks and expand the model's applicability to even larger datasets.


Graph-Mamba presents a significant advancement in the modeling of long-range dependencies within large graphs. By integrating selective state space models into graph networks, it offers a path towards scalable, efficient, and effective graph representation learning. This work not only demonstrates notable improvements in predictive performance but also opens new avenues for future research in graph-based machine learning.

Get summaries of trending AI/ML papers delivered straight to your inbox

Unsubscribe anytime.