Temporal Graph Neural Networks

Updated 4 January 2026

Temporal Graph Neural Networks (TGNNs) are dynamic deep learning models that incorporate time-aware message passing and memory mechanisms to capture evolving graph interactions.
They leverage advanced techniques such as higher-order modeling, trajectory encoding, and adaptive neighborhood sampling to boost prediction accuracy and computational efficiency.
Recent advancements focus on robust transfer learning, resilience against adversarial attacks, and enhanced interpretability, driving improved performance across diverse applications.

Temporal Graph Neural Networks (TGNNs) are a class of deep learning architectures designed to model dynamic graphs whose topology, attributes, and interactions evolve over time. TGNNs generalize static Graph Neural Networks by incorporating temporal information into message passing, memory, and aggregation mechanisms, allowing them to learn complex patterns in domains such as social networks, recommendation systems, communication infrastructures, and biological systems. Continuous development in TGNN methodologies has focused on improving expressivity, computational efficiency, adaptability to irregular data and evolving graph structures, transfer learning for data-scarce regimes, robustness to adversarial attacks, and the interpretability of model predictions.

1. Mathematical Foundations and Core TGNN Architectures

TGNNs operate on temporal graphs typically defined as sequences of time-stamped events $G = \{e_1, \ldots, e_N\}$ , where events $e_i = (u, v, t, x_u, x_v, x_{uv}(t))$ encode interactions between nodes with associated features and time $t$ (Agarwal et al., 2 Mar 2025). The canonical TGNN workflow maintains a time-dependent memory vector $\mu_u(t)$ for each node $u$ , updated upon each event through message computations and aggregation:

$m_i = MSG([\mu_u(t^-), \mu_v(t^-), \Delta t, x_{uv}(t)]) \ \mu_u(t') = GRU(m_i, \mu_u(t^-)) \ h_u(t) = COMBINE(x_u, \mu_u(t))$

where $MSG$ is typically a small MLP, $\Delta t = t' - t^{-}$ , and $COMBINE$ is a feature fusion operator. Temporal aggregation recursively computes:

$m_u^{(l)}(t) = AGGREGATE(\{h_v^{(l-1)}(t), t-t_{uv}, x_{uv}\}_{v \in N_u(t)}) \ h_u^{(l)}(t) = COMBINE(h_u^{(l-1)}(t), m_u^{(l)}(t))$

The binary cross-entropy loss for future link prediction is:

$L = -\sum_{(u,v,t')^+} \log p_{uv}(t') - \sum_{(u,v,t')^-} \log(1-p_{uv}(t'))$

Prominent TGNNs include TGN (Rossi et al., 2020), TGAT, DySAT, Jodie, DyRep, and their variants. TGN introduces a general framework where per-node memories summarize historical information, and embeddings are produced via graph-based aggregation or direct projection of memories.

2. Advancements in Expressivity: Higher-Order and Trajectory-Aware Models

Many TGNNs focus on pairwise interactions, but Higher-order structure Temporal Graph Neural Network (HTGN) utilizes temporal hypergraphs to encode group interactions and higher-order structures that impact link formation and evolution (Liu et al., 21 May 2025). Hyperedge construction leverages clique enumeration to define hyperedges, and memory vectors are maintained per hyperedge:

$\mathbf{m}[c] = \sum_{E \in \mathcal{D}_c} \text{MLP}(\mathbf{m}[E]) \alpha^{-\beta (t^* - t[E])}$

Message passing and node embeddings incorporate time-encoding and multi-hop temporal neighborhoods, resulting in strictly increased distinguishing power over standard pairwise TGNNs.

Trajectory Encoding TGNNs (TETGN) address the limitations of anonymous vs. non-anonymous TGNNs in transductive and inductive regimes by introducing learnable, automatically expandable node identifiers as temporal positional features and performing trajectory-aware aggregation combined with memory (Xiong et al., 15 Apr 2025). Multi-head attention fuses trajectory and memory streams, resulting in SOTA performance across both prediction and classification tasks.

3. Efficient and Adaptive Neighborhood Sampling

Scaling TGNNs to large dynamic graphs necessitates efficient temporal neighbor sampling. Most TGNNs use fixed heuristics—uniform or most-recent—which can be suboptimal. FLASH generalizes the historical neighbor selection with a graph-adaptive, learnable mechanism, using scoring functions over neighbor features and link contexts, selected via differentiable top-k algorithms; it is trained end-to-end with self-supervised ranking losses to maximize relevant historical context (Feldman et al., 9 Apr 2025). TASER further optimizes accuracy and scalability using temporal adaptive sampling for both mini-batches and neighbor selection, equipped with GPU-parallel finders and VRAM caches, and supports self-supervised and REINFORCE-style co-training of the sampler policy (Deng et al., 2024). Both yield systematic gains in model accuracy (2–8% AP or MRR improvements), robustness to noise, and large speedups (5×–60×) in data preparation.

4. Transfer Learning and Robustness in Data-Scarce and Adversarial Regimes

Transferring learned temporal patterns between disjoint graphs is non-trivial due to the node-specific nature of TGNN memory. The MINTT framework introduces a structured bipartite encoding, disentangling node representations from features, and uses a four-phase FGAT module for inductive transfer. MINTT enables effective transfer of memory and weights from source to target graphs with memory initialization (Agarwal et al., 2 Mar 2025), yielding substantial performance improvements in low-data settings (up to 56% AP, 400% MRR, and 210% Recall@20 vs non-transfer).

TGNNs are vulnerable to temporally targeted adversarial attacks. HIA (High Impact Attack) employs data-driven surrogate scoring for node importance, combining temporal degree growth, betweenness, and community-aware centrality to identify disruptively influential nodes; attacks are then performed by hybrid edge injection and deletion. HIA achieves 35.55% MRR drop versus 23.7% for prior baselines (Jeon et al., 29 Sep 2025). Defense recommendations include adversarial training on temporally perturbed sequences, dynamic graph purification, and regularization to penalize over-reliance on critical nodes.

5. Model Interpretability: Motif-Based, Bayesian, and Universal Explainers

Interpretability of TGNNs remains challenging given temporal complexity and memory-based aggregation. TempME uses information bottleneck principles to extract the minimal set of temporal motifs essential for prediction, with motif sampling, structural encoding, and scoring to maximize explanation fidelity and sparsity (Chen et al., 2023). Motif-aware explanations also enhance link prediction by up to 22.96 pp AP. Probabilistic graphical model-based approaches mine dominant dynamic dependencies via Bayesian networks learned across sliding temporal windows (He et al., 2022). TGIB merges prediction and explanation via built-in temporal graph information bottleneck layers, applying Gumbel-Softmax for stochastic subgraph selection and optimizing mutual information regularization; TGIB yields top AP and explanation quality on multiple benchmarks (Seo et al., 2024). Universal explainers such as GRExplainer abstract TGNN inputs into node-sequence and retained-matrix representations, using RNN generative models for automated and instance-level explanation generation that is efficient and applicable to both snapshot and event-based TGNNs (Li et al., 28 Dec 2025).

Rigorous impossibility results show that perturbation-based black-box explanation methods (node-perturbation, edge-perturbation, and joint perturbation) cannot reliably identify internal TGNN causal structures, necessitating white-box access or time-resolved perturbation for faithful interpretability (Vu et al., 2022).

6. Evaluation Metrics, Scalability, and Design Principles

Traditional instance-based metrics (AP, AUC) inadequately capture temporal error clustering, as shown by volatility-aware studies (Su et al., 2024). Volatility Cluster Statistics (VCS) provide refined temporal performance analysis, revealing clustering patterns unique to TGNN architectures and enabling regularization to minimize error bursts without sacrificing AP accuracy.

Large-scale benchmarking (10,000 GPU hours) has revealed key design principles for ideal TGNNs (Yang et al., 2024):

The most-recent neighbor sampler and attention aggregator outperform uniform sampling and MLP-Mixer on most datasets.
Node memory modules should be chosen according to the temporal repetition pattern: RNN/GRU for sessional (short-term), static embedding-table for periodic (long-term) datasets.
Shallow architectures with memory (1-2 layers, k ≈ 10 neighbors) saturate link prediction performance, making deeper/wider models inefficient.
Non-learnable (cosine-based) time encoding is preferred for stability.
Interplay among sampling, aggregation, and memory modules benefits from modular plug-and-play frameworks, allowing flexible adaptation to different temporal data regimes.

Transforming TGNNs into efficient autoregressive sequence models using transformers (TF-TGN) provides direct scalability to billion-edge graphs, with engineered kernels and parallel sampling, achieving training speedups of 2.20× to over 10×, and accuracy on par or exceeding baseline TGNNs (Huang et al., 2024).

7. Applications, Advanced Methods, and Future Directions

TGNNs have achieved state-of-the-art in domains such as session-based recommendation (TempGNN) (Oh et al., 2023), irregular time series forecasting (TGNN4I) with time-continuous latent states and ODE dynamics (Oskarsson et al., 2023), and rapid adaptation to evolving topologies via temporal graph rewiring with expander propagation (TGR), alleviating under-reaching, over-squashing, and memory staleness (Petrović et al., 2024).

Challenges remain in generalizing TGNNs to handle heterogeneous graphs, continuous feature encoding, robust transfer across highly dissimilar domains, meta-learning adaptation for target scarcity, principled similarity metrics for temporal graph transfer, scalable explainers for group and class-level predictions, and more expressive adversarial and denoising frameworks. The growing diversity of TGNN architectures opens the door to dynamic graph learning systems that are expressively powerful, interpretable, robust, and scalable across a wide spectrum of temporal data regimes.