Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 74 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 87 tok/s Pro
Kimi K2 98 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Claude Sonnet 4 40 tok/s Pro
2000 character limit reached

DyGSSM: Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update (2505.09017v1)

Published 13 May 2025 in cs.LG and cs.SI

Abstract: Most of the dynamic graph representation learning methods involve dividing a dynamic graph into discrete snapshots to capture the evolving behavior of nodes over time. Existing methods primarily capture only local or global structures of each node within a snapshot using message-passing and random walk-based methods. Then, they utilize sequence-based models (e.g., transformers) to encode the temporal evolution of node embeddings, and meta-learning techniques to update the model parameters. However, these approaches have two limitations. First, they neglect the extraction of global and local information simultaneously in each snapshot. Second, they fail to consider the model's performance in the current snapshot during parameter updates, resulting in a lack of temporal dependency management. Recently, HiPPO (High-order Polynomial Projection Operators) algorithm has gained attention for their ability to optimize and preserve sequence history in State Space Model (SSM). To address the aforementioned limitations in dynamic graph representation learning, we propose a novel method called Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update (DyGSSM). Our approach combines Graph Convolution Networks (GCN) for local feature extraction and random walk with Gated Recurrent Unit (GRU) for global feature extraction in each snapshot. We then integrate the local and global features using a cross-attention mechanism. Additionally, we incorporate an SSM based on HiPPO algorithm to account for long-term dependencies when updating model parameters, ensuring that model performance in each snapshot informs subsequent updates. Experiments on five public datasets show that our method outperforms existing baseline and state-of-the-art (SOTA) methods in 17 out of 20 cases.

Summary

  • The paper introduces a multi-view model that fuses local GCN and global GRU embeddings to capture both short- and long-range graph structures.
  • It employs a State Space Model with HiPPO-based gradient updates to effectively handle temporal dependencies in dynamic graph snapshots.
  • The window-based training mechanism over successive snapshots improves scalability and models long-term dependencies for link prediction.

This paper introduces DyGSSM (Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update) (2505.09017), a novel approach for dynamic graph representation learning, specifically applied to the task of link prediction. The authors address limitations in existing dynamic graph methods, which often struggle to simultaneously capture both local and global structural information within graph snapshots and fail to effectively manage temporal dependencies during model parameter updates.

DyGSSM tackles these challenges by proposing a multi-view architecture that combines local and global feature extraction for each graph snapshot and integrates a State Space Model (SSM) with a HiPPO-based gradient update mechanism.

Here's a breakdown of the DyGSSM implementation and application:

  1. Multi-view Node Embedding per Snapshot:
    • Local View: A standard Graph Convolutional Network (GCN) is used to capture local structural information for each node in a snapshot Gt\mathcal{G}_t. The GCN aggregates information from immediate neighbors through message passing. The implementation follows a layer update style similar to WinGNN (Mou et al., 2023), using shared trainable weight matrices and skip connections across layers. This produces local node embeddings Xtlocal\mathbf{X}_t^{local}.
    • Global View: To capture broader graph structure beyond immediate neighbors, a biased Random Walk (RW) coupled with a Gated Recurrent Unit (GRU) is employed. For each node, biased random walks (tuned towards exploring distant nodes) are performed to generate node sequences. A fixed number of the most frequent nodes from these sequences are selected. These sequences are then processed by a two-layer GRU (or a lightweight variant, Light GRU) to produce global node embeddings Xtglobal\mathbf{X}_t^{global}. Weights for the GRU layers are shared across snapshots. The use of Light GRU (Feng et al., 2 Oct 2024) is an implementation consideration for scalability, offering reduced parameters and parallelism.
    • Integration: The local (Xtlocal\mathbf{X}_t^{local}) and global (Xtglobal\mathbf{X}_t^{global}) embeddings for the same snapshot are fused using a cross-attention mechanism. The local embeddings serve as the query, and global embeddings as key and value. This attention mechanism learns to weigh the importance of global information relative to local information when creating the final fused embedding Xtfused\mathbf{X}_t^{fused}. A single-head attention is used for parameter efficiency.
  2. SSM-based Gradient Update:
    • The core mechanism for handling temporal dependencies during parameter updates involves an SSM. The model calculates a prediction loss (Ltfused\mathcal{L}_t^{fused}) for the fused embeddings Xtfused\mathbf{X}_t^{fused} at the current snapshot tt (e.g., binary cross-entropy for link prediction). A separate loss (Ltglobal\mathcal{L}_t^{global}) is calculated for the global embeddings Xtglobal\mathbf{X}_t^{global} to update the GRU parameters.
    • For updating the GCN and cross-attention parameters, the gradient of Ltfused\mathcal{L}_t^{fused} with respect to the model parameters (δLtfusedδΘt\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}) is computed.
    • A dynamic weight weightt=1Ltfused+ϵweight_t = \frac{1}{\mathcal{L}_t^{fused} + \epsilon} is calculated, inversely proportional to the prediction loss. This ensures that snapshots where the model performs poorly contribute less to the state update.
    • An SSM state sts_t is maintained and updated based on the previous state st1s_{t-1}, the current gradient G^t\hat{G}_t (derived from δLtfusedδΘt\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}), and the dynamic weight weighttweight_t: st=K^st1+G^tweightts_t = \hat{K} s_{t-1} + \hat{G}_t weight_t. The matrix K^\hat{K} is initialized using the HiPPO algorithm (Gu et al., 2022) to effectively preserve and project historical information.
    • The model parameters Θt+1\Theta_{t+1} for the next snapshot are updated based on the current parameters Θt\Theta_t, the current gradient δLtfusedδΘt\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}, and the SSM state sts_t: Θt+1(δLtfusedδΘt)+Θt×st\Theta_{t+1} \leftarrow (\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}) + \Theta_t \times s_t. This incorporates a form of meta-learning where past performance (captured by the SSM state influenced by weighted historical gradients) guides the parameter update for the future.
  3. Window-based Multi-snapshot Training:
    • Instead of updating parameters based only on adjacent snapshots, DyGSSM uses an overlapping sliding window of snapshots (Δt\Delta t).
    • Within a window, snapshots are processed sequentially. For each snapshot tt in the window, the local, global, and fused embeddings are computed, and the losses Ltfused\mathcal{L}_t^{fused} and Ltglobal\mathcal{L}_t^{global} are calculated. The SSM state and parameters Θt\Theta_t are updated based on Ltfused\mathcal{L}_t^{fused} and passed to the next snapshot t+1t+1.
    • This sequential processing within the window accumulates information. After processing all snapshots in the window, a final backpropagation step is performed using the aggregated losses across the window: Lfused=1Δti=t+1t+ΔtLifused\mathcal{L}_{fused} = \frac{1}{\Delta t} \sum_{i=t+1}^{t+\Delta t} \mathcal{L}_{i}^{fused} and Lglobal=1Δti=tt+ΔtLiglobal\mathcal{L}_{global} = \frac{1}{\Delta t} \sum_{i=t}^{t+\Delta t} \mathcal{L}_{i}^{global}. This leverages the window to capture longer dependencies than just adjacent snapshots.

Practical Implementation Details and Considerations:

  • Preprocessing: The biased random walks for global feature extraction are precomputed offline for efficiency during training.
  • Task: The model is applied to the link prediction task, formulated as a binary classification problem on pairs of nodes.
  • Architecture: The core components are GCNs, GRUs (or Light GRUs), cross-attention, and the SSM state logic. The GCN and Cross-Attention parameters are updated using the fused loss and SSM, while GRU parameters are updated using the global loss.
  • Optimization: Standard optimizers like Adam can be used. The training involves iterating through overlapping windows and performing backpropagation based on aggregated losses within the window.
  • Scalability: The Light GRU variant is proposed to improve scalability by reducing parameters and enabling parallel computation within the sequence processing. Memory usage can still be a concern, especially for large graphs and window sizes, as noted by the OOM errors encountered for some baselines on larger datasets.
  • Hyperparameters: Key hyperparameters include the number of GCN layers, GRU layers, dimension sizes for embeddings, random walk length and repetitions, window size (Δt\Delta t), learning rate, and optimizer settings. The choice of GRU variant (traditional vs. Light) is also an implementation choice.
  • Evaluation: Performance is measured using standard metrics like Accuracy, AUC, MRR, and Recall@10, with emphasis on ranking metrics (MRR, Recall@10, AP) due to potential class imbalance.

Application:

DyGSSM is directly applicable to real-world dynamic graph problems such as:

  • Link Prediction: Forecasting future connections in social networks, collaboration graphs (like DBLP), or transaction networks (like Bitcoin). For instance, predicting future collaborations between authors, trust links between users, or communication paths.
  • Node Classification: Predicting node properties that change over time (though the paper focuses on link prediction, the node embeddings could be used for this).
  • Recommendation Systems: Modeling user-item interactions in dynamic recommendation graphs, where user preferences and item popularity evolve over time. The learned dynamic node embeddings could be used to improve recommendation accuracy.

The strength of DyGSSM lies in its ability to capture both local and global structural evolution and leverage historical performance signals through the SSM for more informed parameter updates, potentially leading to better performance on dynamic tasks compared to models relying solely on message passing or adjacent snapshot updates. The ablation paper confirms the contribution of each component to the overall performance.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 13 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube