Papers
Topics
Authors
Recent
2000 character limit reached

GLFormer: Efficient Dynamic Graph Modeling

Updated 23 November 2025
  • GLFormer is an attention-free Transformer-style architecture designed for dynamic graph temporal link prediction using adaptive local token mixing and hierarchical aggregation.
  • It introduces a learnable local token mixer with positional and temporal components to efficiently fuse recent neighbor information, reducing computational cost.
  • Empirical evaluations show improved average precision and AUC with 3–10x speedups over traditional self-attention methods on benchmark datasets.

GLFormer is an attention-free, Transformer-style architecture designed for efficient dynamic graph modeling, particularly temporal link prediction. It advances over traditional self-attention-based dynamic graph Transformers by introducing adaptive local token mixing and hierarchical aggregation for scalable modeling of temporally evolving relationships in large or high-frequency graphs. GLFormer’s architecture challenges the presumption that global, full self-attention is essential for state-of-the-art predictive performance in dynamic graphs, instead leveraging context-aware local aggregation mechanisms that fuse information from temporally ordered interactions with high computational efficiency and robustness to noise (Zou et al., 16 Nov 2025).

1. Motivation and Architectural Rationale

Transformer-style models for dynamic graphs such as DyGFormer and TGAT rely on self-attention to capture long-term temporal dependencies. However, self-attention incurs O(N2)\mathcal{O}(N^2) computational and memory complexity per sequence of length NN and may indiscriminately aggregate noisy or irrelevant distant events. Recent analyses in the "MetaFormer" paradigm indicate that the expressive power of Transformers can largely be attributed to their macro-architectural traits—residual connections, layer normalization, and feed-forward subnetworks—rather than strictly to the attention operator.

Empirical studies within dynamic graph contexts show that local mixing strategies, such as pooling or MLP-based token mixers, can match or outperform full attention with substantially lower computational cost. GLFormer therefore adopts attention-free local mixers within a classic Transformer skeleton, combining efficiency with architectural expressivity. The core is a stack of layers, each comprising (i) a learnable local token-mixing sub-block and (ii) a channel-mixing feed-forward module, interleaved with residual connections and layer normalization. Two key innovations distinguish GLFormer: an adaptive, context-aware token mixer, and a hierarchical aggregation scheme enabling progressively enlarged temporal receptive fields in a causal, efficient manner.

2. Adaptive Token Mixer

For each node uu with chronological neighbor embeddings Iu=[Xu1,,XuN]RN×dI_u = [X_{u_1}, \ldots, X_{u_N}] \in \mathbb{R}^{N \times d} (associated with timestamps t1tNt_1 \leq \cdots \leq t_N), the adaptive token mixer aggregates the MM most recent neighbors for every position ii:

Hi,:=p=0M1αpiIip,:H_{i,:} = \sum_{p=0}^{M-1} \alpha_p^i \cdot I_{i-p,:}

Mixing weights αpi\alpha^i_p combine two factors:

  • Positional importance wpw_p: learned per-token-offset weight, capturing ordinal significance.
  • Temporal proximity θpi\theta_p^i: a softmax over exponentially decayed timestamp intervals Δtpi=titip\Delta t^i_p = t_i - t_{i-p}, favoring temporally proximate events.

The composite mixing coefficient is given by:

αpi=βwp+(1β)θpi\alpha_p^i = \beta \cdot w_p + (1-\beta) \cdot \theta_p^i

θpi=exp(Δtpi)q=0M1exp(Δtqi)\theta_p^i = \frac{\exp(-\Delta t_p^i)}{\sum_{q=0}^{M-1} \exp(-\Delta t_q^i)}

where β[0,1]\beta \in [0,1] is a learnable scalar. This context-aware token mixing adaptively fuses position and timing, enabling the model to privilege relevant recent information without needing to attend globally over all N2N^2 token pairs.

3. Hierarchical Aggregation and Temporal Receptive Field

To efficiently enlarge the temporal receptive field and capture longer-term patterns, GLFormer stacks LL token-mixer layers with dilated offset ranges:

Rl={pZsl1psl}R_l = \{ p \in \mathbb{Z} \mid s^{l-1} \leq p \leq s^l \}

with kernel size Kl=RlK_l = |R_l| and layer-wise offset boundaries s0=0<s1<<sLs^0=0 < s^1 < \ldots < s^L. The ll-th layer mixer processes prior outputs HTA(l1)H^{(l-1)}_{TA} as:

Hi,:(l)=pRl(αpi)(l)HTA,ip,:(l1)H_{i,:}^{(l)} = \sum_{p \in R_l} (\alpha_p^i)^{(l)} \cdot H_{TA, i-p,:}^{(l-1)}

Causality is preserved by masking out cases where ip<1i-p < 1. The hierarchical stacking results in a dilated, causal temporal receptive field reaching up to sLs^L, while operations remain local in each layer.

4. Computational and Parameter Efficiency

The computational and space complexity of GLFormer compares favorably against self-attention:

Operation Complexity Parameters
Self-attention (per layer) O(N2d)\mathcal{O}(N^2d) O(d2)\mathcal{O}(d^2)
GLFormer token mixer (layer ll) O(NKld)\mathcal{O}(N K_l d) KlK_l for wpw_p, few scalars β\beta

Stacking LL layers, the total cost is l=1LO(NKld)\sum_{l=1}^L \mathcal{O}(N K_l d). As KlN\sum K_l \ll N in practice, overall complexity approaches quasi-linear scaling in NN. Memory and parameter requirements are also reduced, as GLFormer does not require projection matrices or pairwise attention maps. This confers notable speedups during both training and inference.

5. Empirical Performance and Experimental Protocol

GLFormer was evaluated on six benchmark dynamic-graph datasets—Wikipedia, Reddit, MOOC, LastFM, SocialEvo, and Enron—for transductive temporal link prediction. Data consists of timestamped user-item or user-user sequences, partitioned 70%/15%/15% chronologically. Comparison was performed across five prominent backbone encoders: TGN, TCL, TGAT, CAWN, and DyGFormer. Four token mixing strategies were compared for each backbone:

  1. Vanilla Transformer (self-attention)
  2. Pooling over the ss most recent neighbors
  3. MLP Mixer over tokens
  4. GLFormer’s adaptive mixer with hierarchical aggregation

Metrics included Average Precision (AP) and AUC-ROC. GLFormer achieved the best average rank across all datasets, yielding AP improvements of 0.2–1.5 over vanilla attention. For DyGFormer, replacing attention with GLFormer attained +0.35 AP on MOOC and +0.24 AP on Reddit. Inference time analyses demonstrated 3–10×\times speedups over vanilla Transformers, and 1.5–3×\times versus MLP Mixers. Ablation studies confirmed the necessity of learnable positional and temporal components, residual connections, and non-linearity choices (GELU preferred over ReLU): removal of any degrades AP by up to 1.5.

6. Limitations and Future Directions

A principal limitation is the fixed nature of layerwise offset boundaries sls^l, which must be preselected; dynamically learning these or employing attention-like sparsification mechanisms could augment adaptivity. GLFormer is tailored to first-order neighbor sequences, and extension to multi-hop or heterogeneous neighbor types (e.g., relations with attribute-rich edges) remains open. While the present work concentrates on transductive link prediction, additional axes for future research include inductive settings, dynamic node classification, and continuous hypergraph forecasting.

7. Impact and Significance

GLFormer demonstrates that attention-free, local, adaptive token mixing architectures can rival or surpass global self-attention in dynamic graph settings both in accuracy and computational efficiency. Its architectural paradigm, grounded in residual learning, layer normalization, and hierarchical local aggregation, supports robust modeling of evolving networked systems while facilitating scalability to long sequences and high-frequency interaction data. This development questions the necessity of expensive full self-attention in dynamic graphs and signals a movement toward lighter, more scalable temporal graph models (Zou et al., 16 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to GLFormer.