Attending to Graph Transformers (2302.04181v3)

Published 8 Feb 2023 in cs.LG, cs.AI, and cs.NE

Abstract: Recently, transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs, such as (message-passing) graph neural networks. So far, they have shown promising empirical results, e.g., on molecular prediction datasets, often attributed to their ability to circumvent graph neural networks' shortcomings, such as over-smoothing and over-squashing. Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field. We overview their theoretical properties, survey structural and positional encodings, and discuss extensions for important graph classes, e.g., 3D molecular graphs. Empirically, we probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing. Further, we outline open challenges and research direction to stimulate future work. Our code is available at https://github.com/luis-mueller/probing-graph-transformers.

Authors (4)

Luis Müller (7 papers)
Mikhail Galkin (39 papers)
Christopher Morris (41 papers)
Ladislav Rampášek (12 papers)

Citations (75)

View on Semantic Scholar

Summary

Attending to Graph Transformers: A Critical Overview

The paper "Attending to Graph Transformers" explores the landscape of graph transformers (GTs), a burgeoning field that seeks to enhance the capabilities of graph-based machine learning models through transformer architectures. Over recent years, transformers have proliferated in fields such as natural language processing and computer vision, prompting researchers to adapt this versatile architecture to graph data. In comparison to conventional (message-passing) graph neural networks (GNNs), GTs promise to mitigate issues like over-smoothing and over-squashing, which have historically plagued GNNs.

Taxonomy and Theoretical Insights

In response to the myriad approaches to GTs, this paper constructs a taxonomy of these architectures, categorizing them based on theoretical properties, structural and positional encodings, input features, tokenizer methods, and message propagation mechanisms.

The theoretical discussions emphasize the expressiveness of GTs, suggesting that GTs, in their basic form, lack the capacity to distinguish non-isomorphic graphs or approximate permutation-invariant and equivariant functions relative to GNNs. This limitation arises from their dependency on structural and positional encodings for capturing graph structure—a crucial point highlighted in their taxonomy. The GTs' capacity to simulate GNNs given specific conditions, or align with higher-order GNNs, underscores the interplay between the two architectures yet suggests enhanced expressiveness only with sophisticated encodings.

Structural and Positional Encodings

The authors delve into structural and positional encodings crucial for adapting transformers to graph data. These encodings serve as pivotal mechanisms enabling GTs to capture local, global, or relative graph structures. The paper categorizes these encodings based on their application level—node, edge, or graph—and delineates their role in enhancing GTs' expressive power compared to traditional GCN approaches. The discussion highlights evolving efforts to achieve graph-invariant encodings, especially for more complex features like Laplacian eigenvectors.

Evaluating GTs in Practical Scenarios

The empirical analysis presented evaluates the performance of GTs across various tasks and datasets, probing their effectiveness in structural awareness, handling heterophilic graphs, and reducing over-squashing. Noteworthy results illustrate that GTs supplemented with structural bias perform well on tasks necessitating structural comprehension, uniformly outperforming standard GNN models on several datasets with heterophilic characteristics. However, the scalability of GTs remains problematic, especially as dataset sizes increase, where attention mechanisms may falter due to information noise.

Applications and Future Directions

GTs have already found applications in diverse areas, notably molecular property prediction and brain network analysis, exemplifying their potential beyond conventional GNNs. However, the research emphasizes GTs' dependency on structural and positional encodings, especially when extended to real-world problems with inherent geometric information.

Future work is suggested to explore scaling GTs for larger datasets efficiently, improving their interpretability, and systematically evaluating the expressiveness and generalization of GTs across larger, more complex graph structures. Additionally, there is an appeal to further explore the principles behind novel encoding strategies, drawing parallels to NLP's Bertology, to better understand and harness GTs' capabilities.

The paper comprehensively addresses the current state of GTs, their constraints, and future potential, offering a practical guide for their application and addressing fundamental questions to spur further research in this evolving field. This structured examination provides a foundational understanding of graph transformers while acknowledging the continual development and future possibilities within the broader context of machine learning on graph data.

PDF Markdown

Related Papers

Exphormer: Sparse Transformers for Graphs (2023)
Transformers are efficient hierarchical chemical graph learners (2023)
Transformers over Directed Acyclic Graphs (2022)
Aligning Transformers with Weisfeiler-Leman (2024)
Graph Transformers: A Survey (2024)

Find Related Papers

GitHub

GitHub - luis-mueller/probing-graph-transformers: Code for our paper "Attending to Graph Transformers" (88 stars)

Tweets

https://twitter.com/luis_pupuis/status/1775555313061470323

https://twitter.com/knishimae0531/status/1775723313224155383