Graph-Bert: Only Attention is Needed for Learning Graph Representations (2001.05140v2)

Published 15 Jan 2020 in cs.LG, cs.NE, and stat.ML

Abstract: The dominant graph neural networks (GNNs) over-rely on the graph links, several serious performance problems with which have been witnessed already, e.g., suspended animation problem and over-smoothing problem. What's more, the inherently inter-connected nature precludes parallelization within the graph, which becomes critical for large-sized graph, as memory constraints limit batching across the nodes. In this paper, we will introduce a new graph neural network, namely GRAPH-BERT (Graph based BERT), solely based on the attention mechanism without any graph convolution or aggregation operators. Instead of feeding GRAPH-BERT with the complete large input graph, we propose to train GRAPH-BERT with sampled linkless subgraphs within their local contexts. GRAPH-BERT can be learned effectively in a standalone mode. Meanwhile, a pre-trained GRAPH-BERT can also be transferred to other application tasks directly or with necessary fine-tuning if any supervised label information or certain application oriented objective is available. We have tested the effectiveness of GRAPH-BERT on several graph benchmark datasets. Based the pre-trained GRAPH-BERT with the node attribute reconstruction and structure recovery tasks, we further fine-tune GRAPH-BERT on node classification and graph clustering tasks specifically. The experimental results have demonstrated that GRAPH-BERT can out-perform the existing GNNs in both the learning effectiveness and efficiency.

Citations (263)

View on Semantic Scholar

Summary

The paper introduces Graph-Bert, which replaces traditional graph convolutions with a pure attention mechanism for efficient representation learning.
It utilizes sampled linkless subgraphs and diverse embeddings to overcome scalability and over-smoothing issues in deep graph neural networks.
Graph-Bert's pre-training strategy accelerates convergence and boosts performance on node classification and clustering tasks across standard benchmarks.

Overview of Graph-Bert: Attention-Based Graph Representation Learning

The paper introduces Graph-Bert, a novel approach to graph neural networks (GNNs) leveraging the attention mechanism. This model diverges notably from traditional GNNs by eliminating reliance on graph convolution or aggregation operators, which frequently lead to performance issues such as suspended animation and over-smoothing.

Key Insights and Methodology

Graph-Bert capitalizes on the attention mechanism akin to the Transformer architecture. By using sampled linkless subgraphs, Graph-Bert addresses several challenges faced by existing GNNs, particularly in handling large graphs where memory constraints impede node batching. The model operates in a standalone mode and can transfer to other tasks with or without fine-tuning.

Graph-Bert's Design:

Subgraph Sampling: Nodes are sampled into linkless subgraphs based on intimacy metrics, such as the Pagerank-based intimacy matrix. This method enables efficient processing by focusing on localized contexts instead of entire graph structures.
Embedding Techniques: The model utilizes diverse embeddings, including raw feature vectors, Weisfeiler-Lehman role embeddings, intimacy-based relative positional embeddings, and hop-based relative distance embeddings. These enhance the feature representation by capturing different dimensions of node information.
Graph Transformer Encoder: Inspired by the Transformer architecture, Graph-Bert employs multi-layer attention operations to update node representations iteratively. This attention-based processing eliminates the need for dependencies on graph structure.
Training and Fine-Tuning: Initial pre-training is performed on node attribute reconstruction and graph structure recovery. Subsequently, Graph-Bert is fine-tuned for specific tasks such as node classification and graph clustering, demonstrating adaptable application potential.

Experimental Findings

The experimental evaluation presents robust performance across node classification tasks on benchmark datasets (Cora, Citeseer, Pubmed) outperforming state-of-the-art models like GCN and GAT. Key findings include:

Effective Deep Architectures: Graph-Bert overcomes the issues of deep GNNs by avoiding suspended animation, proving efficient in deep structures with up to 50 layers.
Performance Improvement: Subgraph size significantly impacts performance, with optimal sizes varying across datasets. Graph-Bert's computational cost remains manageable even at larger sizes.
Pre-training Benefits: Pre-training demonstrates accelerated convergence and improved initialization for downstream tasks. The combination of node attribute reconstruction and structural recovery yields the most comprehensive node representations.

Implications and Future Directions

Graph-Bert's integration of attention mechanisms presents a promising shift in graph representation learning. The dissociation from traditional graph convolution allows for parallelization and application on large graphs, which is crucial for real-world scalability.

Potential Directions:

Further exploration of subgraph sampling methods could optimize contextual learning.
Applying Graph-Bert to dynamic graph scenarios or heterogeneous graphs might reveal additional strengths.
Integrating Graph-Bert with other neural network paradigms could unlock more comprehensive modeling capabilities.

Overall, Graph-Bert represents a significant innovation in graph neural networks, leveraging the power of attention to redefine how graph data is processed and understood. The ability to train effectively without full reliance on graph structure opens new avenues for scalable, efficient, and versatile graph learning applications.

PDF Markdown

Related Papers

GitHub

GitHub - jwzhanggy/Graph-Bert: Source code of Graph-Bert (496 stars)