Dynamic Graph Neural Networks

Updated 16 December 2025

Dynamic Graph Neural Networks (DGNNs) are architectures that jointly encode spatial structure and temporal evolution for predictive tasks on evolving graphs.
They integrate discrete and continuous-time models, transformer-based methods, and adaptive mechanisms to capture changing patterns in real-world systems.
DGNNs offer practical insights for applications in social networks, recommender systems, and finance, with scalability improvements through model decoupling and hardware acceleration.

Dynamic Graph Neural Networks (DGNNs) are a class of neural architectures that combine structural representation learning with temporal modeling for graphs whose topology and attributes evolve over time. DGNNs have established themselves as the state-of-the-art for supervised and self-supervised learning on dynamic graphs, addressing core challenges in domains such as social interaction modeling, recommender systems, financial transaction networks, biological systems, and physical process forecasting. Central to DGNNs is the integration of topological (spatial) information and temporal dependencies, yielding time-aware node and edge embeddings suitable for prediction tasks inherently tied to temporal evolution.

1. Formal Foundations and Model Taxonomy

A dynamic graph is a system in which the vertex set and edge set change over time, with associated node and edge attributes possibly varying as well. Two common representations are:

Discrete-time dynamic graphs (DTDG): Sequences of graph snapshots $\{G^1, \ldots, G^T\}$ , where each $G^t=(V^t, E^t, X^t)$ .
Continuous-time dynamic graphs (CTDG): Streams of timestamped interaction events $\{\left(u_i, v_i, t_i\right)\}$ .

The goal of DGNNs is to learn functions $f_\theta$ that encode both instantaneous structural information and the temporal evolution, producing embeddings $h_{i}^{t}$ per node over time for downstream prediction tasks. The model taxonomy can be stratified as follows (Skarding et al., 2020):

Class	Temporal Representation	Canonical Examples
Pseudo-dynamic	Static weighted GNN (+decay)	TDGNN [Qu et al.]
Discrete-time	Snapshot sequence	GCRN-M1/M2, DySAT, EvolveGCN
Continuous-time	Event sequence	JODIE, DyRep, TGAT, TGN

In discrete-time models, DGNNs are constructed by stacking a spatial GNN (e.g., GCN, GAT) with a temporal encoder (RNN, self-attention, etc.), or by integrating recurrent operations within the message passing itself.

2. Core Model Architectures

Stacked DGNNs: Apply a spatial GNN at each time step to generate node-level embeddings $z_{i}^t$ , feeding the output into a temporal model such as an LSTM or self-attention encoder:

$h^{t}_i = \mathrm{LSTM}(h^{t-1}_i, z^{t}_i)$

Example: GCRN-M1, DySAT (Skarding et al., 2020).

Integrated DGNNs: Fuse spatial and temporal updates, e.g., by replacing convolution with graph convolution in a convLSTM:

$f_t = \sigma(W_f *_\mathcal{G} X_t + U_f *_\mathcal{G} h_{t-1} + w_f \odot c_{t-1} + b_f)$

Example: GCRN-M2, EvolveGCN (Skarding et al., 2020).

Continuous-time DGNNs: Process individual interaction events and maintain per-node embeddings updated recursively, often including time-encoding for elapsed intervals:

$h_v^t = \mathrm{DGNN}(h_v^{t-}, \mathrm{AGG}(\{\text{neighbors at }<t\}))$

Example: JODIE, DyRep, TGN, TGAT (Skarding et al., 2020, Jiang et al., 2023).

Transformer-based DGNNs: Employ self-attention mechanisms (SAMs) to model sequential interactions, with advances regarding interaction-level chronology, mixed temporal encoding, and bidirectional interaction features. TIDFormer introduces a particularly interpretable SAM at the interaction level (Peng et al., 31 May 2025).
High-order and Overlap-aware Models: Incorporate second-order and high-order structural signals such as neighborhood overlap directly in the message passing, as in NO-HGNN, to enrich representation power for tasks like link prediction (Wang, 7 Jun 2025).
Decoupled DGNNs: Separate graph propagation and downstream sequence modeling, allowing one to use arbitrary predictors (e.g. LSTM, Transformer) on precomputed temporal node representations, increasing scalability (Zheng et al., 2023).

3. Temporal and Structural Encoding Mechanisms

DGNNs incorporate temporal information through various mechanisms:

Explicit time encodings: Functions (e.g., sinusoidal, time2vec, calendar-based) map elapsed intervals into feature vectors (Jiang et al., 2023, Peng et al., 31 May 2025).
Memory modules: GRU, LSTM, or proprietary memory stacks maintain per-node temporal state, with architectures dictating whether spatial and temporal updates are independent, sequential, or integrated (Ma et al., 2018, Jiang et al., 2023).
Attention over history: Self-attention layers (e.g., DySAT, TIDFormer) model the influence of past states and interactions, with proposals for interaction-level tokens leading to interpretable and robust models (Peng et al., 31 May 2025).
Message passing with adaptive forgetting: Propagate information to neighbors according to recency, strength, and tie decay, as in streaming DGNNs (Ma et al., 2018).

Structural information is encoded using standard GNN aggregation functions, with advanced models supporting high-order (e.g., $k$ -hop, overlap-aware) structures:

Tensor-based high-order propagation: Third-mode tensor products and learnable time mixing enable efficient modeling of multi-hop and time cross-correlation (Wang, 7 Jun 2025).
Spectral decomposition: DMD-GNNs employ data-driven, low-rank spectral filtering along principal dynamic modes recovered from sequences of node features, aligned with dynamical systems theory (Shi et al., 8 Oct 2024).

4. Training Objectives, Losses, and Performance Results

DGNNs are trained primarily for dynamic link prediction, node classification, and sometimes temporal forecasting. Standard objectives include:

Binary cross-entropy loss for link prediction, often with negative sampling.
Cross-entropy loss for node classification.
Temporal point process log-likelihood for event generation in TPP-based models.
Contrastive pre-training for temporal and structural discrimination (CPDG) using triplet-margin or auxiliary losses (Bei et al., 2023).
Information bottleneck regularization to compress representations to the minimal sufficient consensus state and enhance robustness (DGIB) (Yuan et al., 9 Feb 2024).
Spectral or operator regularization: Encouraging dynamics learned by DMD modes to approximate observed feature transitions (Shi et al., 8 Oct 2024).

Empirical evaluations consistently show that DGNNs outperform both static GNNs and traditional dynamic embedding baselines across a range of datasets, tasks, and domains (Skarding et al., 2020, Ma et al., 2018, Jiang et al., 2023, Zheng et al., 2023, Bei et al., 2023, Shi et al., 8 Oct 2024, Wang, 7 Jun 2025).

5. Scalability and Hardware Acceleration

Scalability is a critical consideration for DGNN deployment:

Decoupling propagation and prediction enables the use of arbitrary sequence models and supports billion-edge graphs on a single machine (Zheng et al., 2023).
Incremental computation and caching: Frameworks such as ReInc systematically reuse intermediate GNN aggregations, cache temporal feature sequences, and exploit independence between sliding training windows to achieve order-of-magnitude speedups and near-linear distributed scaling (Guan et al., 25 Jan 2025).
FPGA and hardware acceleration: DGNN-Booster demonstrates multi-level pipelining (across time steps and nodes) and adaptable dataflow architectures for DGNNs on FPGAs, achieving up to $8.4\times$ speedup over GPU and three orders of magnitude higher energy efficiency for inference (Chen et al., 2023).
Dynamic kNN evaluation and approximate neighborhood search: In point cloud applications, dynamic graph construction (e.g., via EdgeConv) constitutes over 90% of inference latency, motivating hardware and algorithmic optimization (Parikh et al., 2023).

6. Explainability, Robustness, and Pre-training

Explainability and robustness have emerged as principal concerns in DGNN research:

Explainability frameworks: The DGExplainer system applies layer-wise relevance propagation (LRP) through both spatial and temporal layers to ascribe prediction influence to specific nodes, timestamps, and inputs, overcoming the limitations of static explainers (Xie et al., 2022).
Robustness under adversarial perturbations: Information-theoretic DGIB models employ minimal-sufficient-consensual (MSC) conditions to compress and regularize temporal embeddings, providing resistance against targeted and untargeted attacks (Yuan et al., 9 Feb 2024).
Self-supervised and contrastive pre-training: Pre-training with temporal and structural contrast, as in CPDG, substantially enhances generalization and transfer across tasks, time splits, and even domains (Bei et al., 2023).

7. Current Challenges and Future Directions

Outstanding challenges for DGNNs include:

Scalability to truly massive, rapidly evolving graphs, supporting both continuous and discrete streams.
Temporal granularity: Adapting models to variable bin sizes and capturing relevant timescales, with coarse bins sometimes improving robustness by reducing timestamp noise (Jiang et al., 2023).
Heterogeneous and high-order structures: Generalizing to multi-relational, signed, or multiplex dynamic graphs, and incorporating high-order dependencies (Wang, 7 Jun 2025).
Model-agnostic uncertainty quantification: Wrapper frameworks such as GSNOP provide ODE-based, uncertainty-aware prediction modules for DGNNs, particularly in sparse-data regimes (Luo et al., 2022).
Automated architecture and hyperparameter selection: Efficient grid and policy search as complexity and dataset diversity grow.
Hardware-aware and out-of-core algorithms: Integration of dataflow and cache optimizations for training and inference on resource-constrained devices (Guan et al., 25 Jan 2025, Chen et al., 2023).

DGNN research continues to advance along axes of dynamic representation power, scalability, interpretability, robustness, and efficiency, presenting a rich landscape of architectures and methodologies for time-variant graph learning.