Pure Transformers are Powerful Graph Learners (2207.02505v2)

Published 6 Jul 2022 in cs.LG and cs.AI

Abstract: We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.

Authors (7)

Jinwoo Kim (40 papers)
Tien Dat Nguyen (6 papers)
Seonwoo Min (10 papers)
Sungjun Cho (18 papers)
Moontae Lee (54 papers)
Honglak Lee (174 papers)
Seunghoon Hong (41 papers)

Citations (167)

View on Semantic Scholar

Summary

An Overview of "Pure Transformers are Powerful Graph Learners"

The paper "Pure Transformers are Powerful Graph Learners" investigates the applicability of standard Transformer architectures, without graph-specific modifications, to graph learning tasks. The authors present a method, termed Tokenized Graph Transformer (TokenGT), which treats nodes and edges of a graph as independent tokens that are subsequently fed into a Transformer. This method leverages token embeddings to encode graph structures, avoiding the need for inductive biases typically embedded in Graph Neural Networks (GNNs).

Key Contributions and Theoretical Insights

Expressiveness: The authors theoretically demonstrate that the proposed method, with appropriate token embeddings, is as expressive as a second-order invariant graph network (2-IGN). This is significant as the 2-IGN surpasses all message-passing GNNs in expressiveness, indicating that TokenGT can capture complex graph structures robustly.
Tokens and Embeddings: Nodes and edges are treated as tokens and augmented with orthonormal node identifiers and trainable type identifiers. This augmentation allows the Transformer to process the connectivity information inherent in a graph, enabling the architecture to learn meaningful representations without explicit graph-specific design.
Theoretical Guarantees: The authors extend their theoretical framework to hypergraphs, showing that a Transformer with order- $k$ token embeddings matches the expressiveness of $k$ -IGN and aligns with the $k$ -Weisfeiler-Lehman (WL) test. This positions TokenGT alongside, or in some cases, beyond the capacities of traditional GNNs.

Empirical Evaluation

The authors validate their theoretical findings through empirical results on the PCQM4Mv2 dataset, a large-scale graph learning task involving molecular graphs. TokenGT not only outperforms GNN baselines but also achieves competitive results compared to variants of Transformers that incorporate intricate graph-specific modifications. The consistent performance across different settings confirms the robustness and versatility of the approach.

Implications and Future Work

Practical Utility: TokenGT's capacity to treat graphs as merely sequences of tokens lowers the barriers for integrating graph data with other modalities in multitask learning settings. This flexibility is particularly beneficial in applications requiring the simultaneous processing of heterogeneous data.
Scalability Considerations: While TokenGT has been shown to effectively approximate equivariant functions, scalability remains an issue due to the quadratic complexity of self-attention mechanisms. The authors highlight potential solutions, such as kernelized attention, to mitigate computational demands.
Potential Improvements: Proposals for future exploration include optimizing node identifiers and exploring sparse edge representations to enhance performance further while maintaining theoretical soundness.
Impact on Graph Learning Paradigms: By demonstrating that standard Transformers can serve as powerful graph learners, the paper challenges the prevailing narrative that complex graph-specific architectures are necessary for such tasks. This opens doors to new research avenues in autoregressive processing, in-context learning, and more seamless integration of graph data into general-purpose models.

In conclusion, the paper posits a compelling argument for utilizing pure Transformers in graph learning. This novel approach leverages the inherent expressiveness of Transformers combined with strategic token embeddings to capture the intricate structures within graph data, presenting an impactful advancement for both theoretical and applied machine learning fields.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - jw9730/tokengt: [NeurIPS'22] Tokenized Graph Transformer (TokenGT), in PyTorch (339 stars)

Tweets

YouTube

Show All Videos