Exphormer: Sparse Transformers for Graphs (2303.06147v2)

Published 10 Mar 2023 in cs.LG

Abstract: Graph transformers have emerged as a promising architecture for a variety of graph learning and representation tasks. Despite their successes, though, it remains challenging to scale graph transformers to large graphs while maintaining accuracy competitive with message-passing networks. In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers. Exphormer consists of a sparse attention mechanism based on two mechanisms: virtual global nodes and expander graphs, whose mathematical characteristics, such as spectral expansion, pseduorandomness, and sparsity, yield graph transformers with complexity only linear in the size of the graph, while allowing us to prove desirable theoretical properties of the resulting transformer models. We show that incorporating Exphormer into the recently-proposed GraphGPS framework produces models with competitive empirical results on a wide variety of graph datasets, including state-of-the-art results on three datasets. We also show that Exphormer can scale to datasets on larger graphs than shown in previous graph transformer architectures. Code can be found at \url{https://github.com/hamed1375/Exphormer}.

PDF Abstract

Exphormer: Sparse Transformers for Graphs

The paper "Exphormer: Sparse Transformers for Graphs" introduces an innovative approach to scaling graph transformers efficiently while maintaining robust performance across various datasets. Transformers have become increasingly popular for graph-based tasks due to their ability to model long-range dependencies and interactions effectively. However, their scalability to large graphs has often been hindered by quadratic complexity in the number of nodes. Exphormer addresses this challenge by incorporating sparse attention mechanisms into graph transformers.

The core contribution of Exphormer is the integration of sparse attention mechanisms built on virtual global nodes and expander graphs, offering mathematical properties like spectral expansion and pseudorandomness. This reduces the computational complexity to linear in the graph size while retaining essential connectivity across nodes. The framework promises to scale to larger graphs than previous graph transformer architectures can handle, providing a solution to the computational bottleneck in large-scale graph data processing.

Key Findings and Numerical Results

Sparse Attention Mechanisms: Exphormer employs a combination of global nodes and expander graphs. Global nodes allow all nodes in the graph to connect through a few virtual nodes, mimicking a star topology, whereas expander graphs facilitate efficient long-range communication through a sparse, yet sufficiently connected structure.
Theoretical Properties: The authors rigorously prove that Exphormer maintains spectral properties of full attention mechanisms with a linear number of edges, providing robustness in cut approximations and ensuring good mixing properties for random walks.
Empirical Performance: The paper presents compelling empirical evidence that Exphormer achieves state-of-the-art results on several benchmarks:
- Exphormer outperformed other sparse transformer models, such as BigBird and Performer, on datasets like CIFAR10, MNIST, and MalNet-Tiny.
- The framework matched or exceeded the performance of dense transformer models on datasets requiring long-range dependencies, demonstrating competitive accuracy with fewer parameters.
Scalability: By utilizing expander graphs and virtual nodes, Exphormer takes a decisive step towards scaling transformer architectures to handle larger graph datasets effectively. The architecture proved capable of processing graphs with over 10,000 nodes, offering significant advantages in training batch size and memory efficiency.
Universal Approximation Properties: The paper establishes that Exphormer can approximate any continuous sequence-to-sequence function with sparse transformers possessing trainable positional encoding, given that the expander graph or global nodes satisfy certain connectivity properties.

Implications and Future Directions

From a theoretical standpoint, the paper enriches our understanding of sparse graph attention mechanisms and their ability to preserve essential properties of denser architectures while reducing computational burdens. Practically, Exphormer provides a versatile and scalable foundation for deploying transformers on large-scale graph datasets prevalent in social networks, bioinformatics, and knowledge graphs.

Looking to the future, this work could inspire further exploration into adaptive mechanisms for graph sparsity that dynamically adjust based on the graph's specific properties or the task at hand. Moreover, leveraging Exphormer's framework in distributed settings could further enhance its scalability given the inherent challenges of processing large graph data.

Overall, Exphormer represents a significant advancement in the efficient application of transformer architectures to complex graph-oriented tasks, setting a promising foundation for future research devoted to optimizing transformer-based models for real-world graph data challenges.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Hamed Shirzad (6 papers)
Ameya Velingker (24 papers)
Balaji Venkatachalam (4 papers)
Danica J. Sutherland (49 papers)
Ali Kemal Sinop (12 papers)

Citations (75)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - hamed1375/Exphormer: Exphormer: Sparse Transformer for Graphs (182 stars)

Tweets

YouTube

Show All Videos