Distinguished In Uniform: Self Attention Vs. Virtual Nodes (2405.11951v1)

Published 20 May 2024 in cs.LG

Abstract: Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.

References (38)

Authors (5)

Eran Rosenbluth (6 papers)
Jan Tönshoff (9 papers)
Martin Ritzert (17 papers)
Berke Kisin (2 papers)
Martin Grohe (92 papers)

Citations (9)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Distinguished In Uniform: Self Attention Vs. Virtual Nodes (2405.11951v1)

Summary

Related Papers