Recurrent Structure-Reinforced Graph Transformer
- The paper introduces a recurrent Transformer that explicitly models edge temporal states via a two-stage process and structure-aware attention, significantly advancing dynamic link prediction.
- The methodology employs a structure-reinforced design combining global self-attention with topological and path-based feature encoding to effectively capture both local and global graph dynamics.
- Empirical results demonstrate that RSGT outperforms state-of-the-art baselines on various dynamic graph datasets by mitigating over-smoothing and incorporating historical structural cues.
The Recurrent Structure-reinforced Graph Transformer (RSGT) is a framework for discrete dynamic graph representation learning designed to capture the evolving structural and temporal properties of time-evolving graphs. It addresses limitations in previous approaches that combine recurrent neural networks (RNNs) and graph neural networks (GNNs)—notably their inability to adequately encode edge temporal states and their susceptibility to over-smoothing, which collectively hinder the modeling of dynamic node relationships and the extraction of global structural features. RSGT introduces explicit edge temporal-state modeling and an advanced structure-reinforced transformer architecture within a recurrent paradigm, enabling superior local and global feature integration over discrete graph snapshots (Hu et al., 2023).
1. Edge Temporal States Modeling
At the core of RSGT is a two-stage process for each time step . First, it converts the current graph snapshot together with the previous snapshot into a weighted multi-relation “difference” graph . Here, ensures even vanished edges are considered for their residual effects. Each edge receives a temporal-type among \textbf{emerging} (), \textbf{persisting} (), or \textbf{disappearing} ():
Edge weights encode long-term interaction memory:
where is the consecutive persistence count and are hyperparameters. This construction yields a multi-relation weighted graph whose topology integrates both dynamic and structural cues, addressing the insufficient edge-state modeling in prior methods.
2. Structure-reinforced Graph Transformer Design
The Structure-reinforced Graph Transformer (SGT) operates at each time step on the current and the previous hidden node embeddings . SGT stacks identical encoding layers with the following components:
(a) Global Self-Attention: Standard Transformer attention is computed: with query, key, and value projections , , using learnable weights.
(b) Graph Structural Encoding: For every ordered node pair , two sets of features are extracted:
- Topological attributes:
- Temporal path features along the shortest path from : , embedded and encoded with 1D convolution after positional encoding.
These are concatenated to yield .
(c) Structure-aware Attention Reinforcement: Raw self-attention scores are modulated by an affine map dependent on :
(d) Update and Residuals: Updated node values are produced by normalizing and multiplying by ; standard residual and feed-forward connections apply. After layers, an outer residual is added: .
This architecture enables the transformer to capture both semantic and structure/path-aware dependencies, directly incorporating dynamic edge information into the self-attention mechanism.
3. Recurrent Learning Over Snapshots
RSGT models dynamic graph representation as a shallow recurrence across discrete graph snapshots. With (initial features), the recurrence is:
This sum accumulates past structural-temporal updates, allowing each to encode the full dynamic context up to snapshot . The approach ensures both historical persistence and adaptation to new graph structures.
4. Training Objective, Algorithm, and Complexity
The primary supervised task is dynamic link prediction. For each candidate edge at step , its feature vector is , with prediction via a shallow MLP:
and binary cross-entropy loss with regularization:
The optimization uses AdamW over all parameters, including if learned.
Computational Complexity: For one snapshot, per-layer cost is , dominated by attention () and path encoding (), where is the shortest-path length horizon and the edge embedding dimension. Total runtime is in the number of snapshots, and practical scalability is maintained by constraining , (history window), and .
5. Empirical Performance and Ablation Results
RSGT has been empirically validated on four real-world dynamic graphs:
| Dataset | Edges | Train/Test | |
|---|---|---|---|
| twi-Tennis | 1,000 | 40,839 | 100/20 |
| CollegeMsg | 1,899 | 59,835 | 25/63 |
| cit-HepTh | 7,577 | 51,315 | 77/1 |
| sx-MathOF | 24,818 | 506,550 | 64/15 |
On dynamic link prediction, RSGT outperforms ten strong baselines (DeepWalk, node2vec, GraphSAGE, EvolveGCN, CoEvoSAGE, ROLAND, CTDNE, TGAT, CAW, TREND):
- twi-Tennis: Accuracy 87.6% vs TREND 74.0% (+18.3% absolute)
- CollegeMsg: 86.8% vs 74.6% (+16.4%)
- cit-HepTh: 87.2% vs 80.4% (+8.5%)
- sx-MathOF: 87.9% vs 79.8% (+10.1%)
F1 scores demonstrate commensurate improvements.
Ablation analysis confirms two architectural choices as essential: (a) explicit edge temporal-state modeling (types and weights), (b) structure-aware attention (pairwise topological and path-based features). Removal of either leads to up to 15% performance drop. RSGT maintains robustness across variations in window size, number of transformer layers, attention heads, and shortest-path horizon.
6. Significance, Limitations, and Context
RSGT addresses critical shortcomings of existing dynamic graph embedding algorithms by providing a unified, recurrent, and structure-aware Transformer architecture with explicit modeling of edge temporal states. The integration of dynamic edge types, long-term edge weights, and structure-conditioned attention sets RSGT apart regarding the quality of representations and task performance. This design mitigates GNN over-smoothing, enables extraction of global graph structure, and provides scalable procedures for graphs of moderate to large size.
By consistently outperforming contemporary baselines in dynamic link prediction and demonstrating necessary ablation-verified design advances, RSGT substantiates the importance of fine-grained temporal-state modeling and structure-aware attention in dynamic graph learning. A plausible implication is that further refinements of Transformer-based recurrent paradigms, potentially with deeper recurrence, online inference, or continuous-time extensions, could continue to advance state-of-the-art performance on evolving graph data (Hu et al., 2023).