Do Transformers Really Perform Bad for Graph Representation? (2106.05234v5)

Published 9 Jun 2021 in cs.LG and cs.AI

Abstract: The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer.

Authors (8)

Chengxuan Ying (3 papers)
Tianle Cai (34 papers)
Shengjie Luo (20 papers)
Shuxin Zheng (32 papers)
Guolin Ke (43 papers)
Di He (108 papers)
Yanming Shen (17 papers)
Tie-Yan Liu (242 papers)

Citations (425)

View on Semantic Scholar

Summary

Insights into "Do Transformers Really Perform Bad for Graph Representation?"

The paper under discussion explores the perplexity surrounding the application of Transformer architectures within the scope of graph representation learning. Historically, Transformers have demonstrated significant prowess in both NLP and computer vision domains. However, their efficacy has not been equally noted in the arena of graph-level prediction tasks, where Graph Neural Networks (GNNs) have been predominant. This paper seeks to decode this enigma by introducing an innovative model, Graphormer.

Graphormer: The Proposed Solution

The Graphormer model, as delineated in the paper, seeks to leverage the standard Transformer architecture while incorporating several essential mechanisms to account for graph structural intricacies. The core of Graphormer's design involves:

Centrality Encoding: Recognizing the importance of node centrality within a graph that signifies the significance of individual nodes, Graphormer integrates node importance directly into the model. This is achieved through degree-based centrality, where node features are enhanced with learnable vectors corresponding to node degrees.
Spatial Encoding: Since graphs lack a canonical linear sequence, traditional positional encodings fall short. Graphormer addresses this by implementing a novel spatial encoding mechanism based on the shortest path distance (SPD) between nodes, effectively incorporating structural relational data within the self-attention process.
Edge Encoding: To harness additional relational data contained within edges, Graphormer introduces edge encoding directly into the attention mechanism. By utilizing features from edges on the shortest paths between node pairs, it establishes a bias term in the attention module that enhances the capacity for modeling complex inter-node dependencies.

Strong Numerical Results & Empirical Validation

Graphormer was empirically tested on several benchmark datasets, including the expansive Open Graph Benchmark Large-Scale Challenge (OGB-LSC), and showcased noteworthy improvements in performance metrics over traditional GNN models. Specifically, on the rigorous OGB-LSC quantum chemistry regression dataset, Graphormer surpassed state-of-the-art GNN variants by over 10% in relative error reductions.

Theoretical Implications and Future Directions

The theoretical underpinnings of Graphormer showcase its expressive power, allowing it to encompass many GNN architectures as particular cases, thus underscoring its adaptability. This advancement paves the way for applying Transformers in domains previously dominated by GNNs, opening new research avenues in improving and scaling Transformer architectures for graph data.

Practically, the implications of these findings suggest that with appropriate encodings, Transformers can be viable and even superior alternatives within graph representation learning tasks. However, the quadratic complexity of the Transformer structure remains a potential limitation. Future research might explore scalable solutions or innovative encoding frameworks to further refine the applicability of Transformers in the graph domain.

In conclusion, while traditional applications of Transformers have not broken through mainstream graph-based leaderboards, this paper provides compelling evidence of their capability through Graphormer, revealing both substantial practical applications and theoretical insights into the model's expressiveness and adaptability.

PDF Markdown

Related Papers

Find Related Papers