Graphormer: Unifying Graph Transformers & GNNs

Updated 1 March 2026

Graphormer is a transformer-based architecture that integrates product graph construction and spectral positional encoding to capture both local and global graph structures.
It replaces standard MPNN layers with Subgraph Attention Blocks for efficient, structure-sensitive message passing based on typed-edge mechanisms.
Empirical evaluations on molecular, OGB, and long-range sequence benchmarks demonstrate improved accuracy and efficiency over traditional GNNs and dense graph transformers.

Graphormer denotes a class of transformer-based architectures designed to operate on graphs, typically by integrating attention mechanisms and positional encodings to capture both local and global graph structure. The Subgraphormer architecture provides a formal unification of recent advances in graph transformers and subgraph-based message passing neural networks (GNNs), coupling the expressive power of subgraph GNNs with the inductive biases of sparse graph transformer attention and efficient, structure-sensitive positional encodings via graph products (Bar-Shalom et al., 2024).

1. Product Graph Construction and Algebraic Foundations

Let $G=(V,E)$ denote an undirected graph with $n$ nodes, adjacency matrix $A\in\mathbb{R}^{n\times n}$ , and node feature matrix $X\in\mathbb{R}^{n\times d}$ . The Subgraphormer architecture builds on the Cartesian product of $G$ with itself, written $G^{\square 2}=G\square G$ . The vertex set of the product graph is $V(G^{\square 2})=V\times V$ . Two nodes $(s,v)$ and $(s',v')$ in $V\times V$ are adjacent if $[s=s'$ and $v\sim_G v']$ or $[v=v'$ and $s\sim_G s']$ . The adjacency matrix of the product graph factors as a Kronecker sum,

$A_{G^{\square 2}}=A\otimes I_n+I_n\otimes A$

where $A\otimes I$ connects $(s,v)$ to subgraphs rooted at neighboring $s$ (vertical edges), and $I\otimes A$ aggregates within subgraphs around $v$ (horizontal edges). This product structure enables canonical modeling of subgraph-level computations as message passing on $G^{\square 2}$ .

2. Subgraph GNNs as Typed-Edge MPNNs on the Product Graph

Maximally expressive subgraph GNNs—such as GNN–SSWL $+$ of Zhang et al. 2023—maintain a hidden state $h^t(s,v)\in\mathbb{R}^d$ at layer $t$ per node pair $(s,v)$ , with updates from its own state, the “point” state $h^t(v,v)$ , messages from horizontal neighbors $\{h^t(s,v') : v'\sim v\}$ , and vertical neighbors $\{h^t(s',v) : s'\sim s\}$ . These updates can be modeled as a single relational graph convolutional network (RGCN), or more generally as a message passing neural network (MPNN) with typed edges, operating on $G^{\square 2}$ .

Define adjacency tensors: $\begin{aligned} A_{G}((s,v),(s',v')) &= \delta_{s=s'}\mathbf{1}_{v\sim v'} \ A_{G^S}((s,v),(s',v')) &= \delta_{v=v'}\mathbf{1}_{s\sim s'} \ A_{\rm point}((s,v),(s',v')) &= \mathbf{1}_{s'=v'=v} \end{aligned}$ The MPNN update per layer is: $h^{t+1} = W_0h^t + A_G h^t W_G + A_{G^S} h^t W_{G^S} + A_{\rm point} h^t W_{\rm pt}$ This recursion precisely recapitulates the update in GNN–SSWL $+$ , demonstrating that expressive subgraph GNNs are a special instance of MPNNs on $G^{\square 2}$ .

3. Sparse Attention and Subgraph Attention Block (SAB)

In Subgraphormer, each typed-edge RGCN layer is replaced with a sparse, transformer-style attention block, termed the “Subgraph Attention Block” (SAB), analogous to GAT (Graph Attention Network). Denote the concatenated states as $H^t\in\mathbb{R}^{n^2\times d_1}$ . For each symmetric adjacency $\mathcal{A}\in\{A_G,A_{G^S}\}$ , compute query, key, value as $Q^t=H^t W_Q$ , $K^t=H^t W_K$ , $V^t=H^t W_V$ , and update with

$[\alpha^t_{\mathcal{A}}]_{ij} = \mathrm{softmax}_{j:\mathcal{A}_{ij}=1}\left(\frac{\langle Q^t_i, K^t_j\rangle}{\sqrt{d_2}}\right)$

$\mathrm{attn}^t_{\mathcal{A}} = \alpha^t_{\mathcal{A}} V^t$

The point channel $\mathit{point}^t(H^t)$ injects root copy states via a GIN-style pointwise layer. The output is pooled over product nodes and passed to a final multilayer perceptron (MLP), preserving permutation invariance.

4. Spectral Product-Graph Positional Encoding

To overcome the expressive limitations of pure attention, Subgraphormer attaches a positional encoding (PE) to each product node based on the Laplacian spectrum of $G^{\square 2}$ . For $G$ , let its Laplacian $L=D-A$ have eigendecomposition $Lv_i=\lambda_i v_i$ . By the properties of the Kronecker sum,

$L_{G^{\square 2}} = L\otimes I + I\otimes L$

with eigenpairs $(v_i\otimes v_j, \lambda_i+\lambda_j)$ . The first $k$ eigenvectors of $L_{G^{\square 2}}$ can thus be constructed from the first $\sqrt{k}$ eigenvectors of $L$ at $O(kn^2)$ cost. The positional encoding is taken as

$\pi(s,v)=\left[(v_i)_s (v_j)_v : 1\leq i,j\leq k'\right]\in\mathbb{R}^{k'^2}$

or a flattening of the leading $k$ Kronecker pairs, yielding a PE that is both structurally faithful and computationally tractable.

5. Expressive Power via Unified Subgraph GNN and Transformer Paradigms

Subgraph GNNs overcome the 1-WL test barrier through node-individualization but remain locally restricted in aggregation. Graph transformers admit global attention but are typically bottlenecked by $O(n^2)$ dense attention and lack node-marking expressivity. The product-graph perspective enables a combination: the vertical/horizontal adjacency and “point” channels realize node-marking expressivity; sparse, attention-driven mixing enables global or selective messaging; and spectral positional encodings break symmetries at higher orders. Subgraphormer strictly simulates any subgraph GNN and augments it with transformer inductive biases and product-graph positional encoding, resulting in richer function classes on graphs.

6. Empirical Evaluation and Comparative Performance

Subgraphormer and its variant with product-graph positional encoding (Subgraphormer+PE) were evaluated on molecular regression tasks (ZINC-12k, ZINC-Full), OGB benchmarks (ogbg-molhiv, molbace, molesol), the Alchemy-12k QM dataset, and long-range sequence tasks (Peptides-func, Peptides-struct). Key performance results include:

Model	ZINC-12k MAE ↓	ogbg-molhiv ROC-AUC ↑	Peptides-struct MAE ↓ (30% subgraphs)
Graphormer-GD	0.081	—	—
GNN–SSWL $+$	0.070	—	—
CIN	—	80.94	—
GSN	—	80.39	—
GPS	—	—	0.2500
Subgraphormer+PE	0.063	80.38	0.2475

On ZINC-12k, Subgraphormer+PE achieves a MAE of 0.063 (vs. 0.070 for both GNN–SSWL $+$ and Graphormer–GD); on ogbg-molhiv, ROC-AUC of 80.38, matching or exceeding leading GNN and transformer baselines; and on Peptides-struct with 30% subgraph sampling, MAE of 0.2475 (vs. 0.2500 for GPS). Under low subgraph-sampling (ZINC-12k, 5%), product-graph PE closes the performance gap: Subgraphormer+PE yields an MAE of 0.175 versus 0.200 for Subgraphormer without PE and 0.179 for DSS-GNN. Across all benchmarks, the architecture realizes the combined empirical benefits of attention and product-graph PE, outperforming both pure subgraph GNNs and pure graph transformers (Bar-Shalom et al., 2024).

7. Synthesis and Broader Context

Subgraphormer represents a principled synthesis of subgraph-based GNNs and sparse graph transformers via the algebraic framework of product graphs. This approach leverages: (a) a message passing view on subgraph aggregation as MPNNs on $G^{\square 2}$ ; (b) replacement of sparse aggregation with transformer-derived SABs; (c) spectral product-graph positional encoding for symmetry breaking and structure awareness; and (d) consistent empirical improvements across molecular, OGB, QM, and long-range sequence tasks. This synthesis provides a pathway for further extensions in graph representation learning, suggesting that unifying structured expressivity with attention and spectral encodings is conducive to modeling complex graph-structured data (Bar-Shalom et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graphormer.

Graphormer: Unifying Graph Transformers & GNNs

1. Product Graph Construction and Algebraic Foundations

2. Subgraph GNNs as Typed-Edge MPNNs on the Product Graph

3. Sparse Attention and Subgraph Attention Block (SAB)

4. Spectral Product-Graph Positional Encoding

5. Expressive Power via Unified Subgraph GNN and Transformer Paradigms

6. Empirical Evaluation and Comparative Performance

7. Synthesis and Broader Context

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Graphormer: Unifying Graph Transformers & GNNs

1. Product Graph Construction and Algebraic Foundations

2. Subgraph GNNs as Typed-Edge MPNNs on the Product Graph

3. Sparse Attention and Subgraph Attention Block (SAB)

4. Spectral Product-Graph Positional Encoding

5. Expressive Power via Unified Subgraph GNN and Transformer Paradigms

6. Empirical Evaluation and Comparative Performance

7. Synthesis and Broader Context

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research