Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 74 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 13 tok/s Pro

GPT-5 High 20 tok/s Pro

GPT-4o 87 tok/s Pro

Kimi K2 98 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Claude Sonnet 4 40 tok/s Pro

2000 character limit reached

DyGSSM: Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update (2505.09017v1)

Published 13 May 2025 in cs.LG and cs.SI

Abstract: Most of the dynamic graph representation learning methods involve dividing a dynamic graph into discrete snapshots to capture the evolving behavior of nodes over time. Existing methods primarily capture only local or global structures of each node within a snapshot using message-passing and random walk-based methods. Then, they utilize sequence-based models (e.g., transformers) to encode the temporal evolution of node embeddings, and meta-learning techniques to update the model parameters. However, these approaches have two limitations. First, they neglect the extraction of global and local information simultaneously in each snapshot. Second, they fail to consider the model's performance in the current snapshot during parameter updates, resulting in a lack of temporal dependency management. Recently, HiPPO (High-order Polynomial Projection Operators) algorithm has gained attention for their ability to optimize and preserve sequence history in State Space Model (SSM). To address the aforementioned limitations in dynamic graph representation learning, we propose a novel method called Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update (DyGSSM). Our approach combines Graph Convolution Networks (GCN) for local feature extraction and random walk with Gated Recurrent Unit (GRU) for global feature extraction in each snapshot. We then integrate the local and global features using a cross-attention mechanism. Additionally, we incorporate an SSM based on HiPPO algorithm to account for long-term dependencies when updating model parameters, ensuring that model performance in each snapshot informs subsequent updates. Experiments on five public datasets show that our method outperforms existing baseline and state-of-the-art (SOTA) methods in 17 out of 20 cases.

Summary

The paper introduces a multi-view model that fuses local GCN and global GRU embeddings to capture both short- and long-range graph structures.
It employs a State Space Model with HiPPO-based gradient updates to effectively handle temporal dependencies in dynamic graph snapshots.
The window-based training mechanism over successive snapshots improves scalability and models long-term dependencies for link prediction.

This paper introduces DyGSSM (Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update) (2505.09017), a novel approach for dynamic graph representation learning, specifically applied to the task of link prediction. The authors address limitations in existing dynamic graph methods, which often struggle to simultaneously capture both local and global structural information within graph snapshots and fail to effectively manage temporal dependencies during model parameter updates.

DyGSSM tackles these challenges by proposing a multi-view architecture that combines local and global feature extraction for each graph snapshot and integrates a State Space Model (SSM) with a HiPPO-based gradient update mechanism.

Here's a breakdown of the DyGSSM implementation and application:

Multi-view Node Embedding per Snapshot:
- Local View: A standard Graph Convolutional Network (GCN) is used to capture local structural information for each node in a snapshot $\mathcal{G}_t$ . The GCN aggregates information from immediate neighbors through message passing. The implementation follows a layer update style similar to WinGNN (Mou et al., 2023), using shared trainable weight matrices and skip connections across layers. This produces local node embeddings $\mathbf{X}_t^{local}$ .
- Global View: To capture broader graph structure beyond immediate neighbors, a biased Random Walk (RW) coupled with a Gated Recurrent Unit (GRU) is employed. For each node, biased random walks (tuned towards exploring distant nodes) are performed to generate node sequences. A fixed number of the most frequent nodes from these sequences are selected. These sequences are then processed by a two-layer GRU (or a lightweight variant, Light GRU) to produce global node embeddings $\mathbf{X}_t^{global}$ . Weights for the GRU layers are shared across snapshots. The use of Light GRU (Feng et al., 2 Oct 2024) is an implementation consideration for scalability, offering reduced parameters and parallelism.
- Integration: The local ( $\mathbf{X}_t^{local}$ ) and global ( $\mathbf{X}_t^{global}$ ) embeddings for the same snapshot are fused using a cross-attention mechanism. The local embeddings serve as the query, and global embeddings as key and value. This attention mechanism learns to weigh the importance of global information relative to local information when creating the final fused embedding $\mathbf{X}_t^{fused}$ . A single-head attention is used for parameter efficiency.
SSM-based Gradient Update:
- The core mechanism for handling temporal dependencies during parameter updates involves an SSM. The model calculates a prediction loss ( $\mathcal{L}_t^{fused}$ ) for the fused embeddings $\mathbf{X}_t^{fused}$ at the current snapshot $t$ (e.g., binary cross-entropy for link prediction). A separate loss ( $\mathcal{L}_t^{global}$ ) is calculated for the global embeddings $\mathbf{X}_t^{global}$ to update the GRU parameters.
- For updating the GCN and cross-attention parameters, the gradient of $\mathcal{L}_t^{fused}$ with respect to the model parameters ( $\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}$ ) is computed.
- A dynamic weight $weight_t = \frac{1}{\mathcal{L}_t^{fused} + \epsilon}$ is calculated, inversely proportional to the prediction loss. This ensures that snapshots where the model performs poorly contribute less to the state update.
- An SSM state $s_t$ is maintained and updated based on the previous state $s_{t-1}$ , the current gradient $\hat{G}_t$ (derived from $\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}$ ), and the dynamic weight $weight_t$ : $s_t = \hat{K} s_{t-1} + \hat{G}_t weight_t$ . The matrix $\hat{K}$ is initialized using the HiPPO algorithm (Gu et al., 2022) to effectively preserve and project historical information.
- The model parameters $\Theta_{t+1}$ for the next snapshot are updated based on the current parameters $\Theta_t$ , the current gradient $\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}$ , and the SSM state $s_t$ : $\Theta_{t+1} \leftarrow (\frac{\delta\mathcal{L}_t^{fused}}{\delta \Theta_t}) + \Theta_t \times s_t$ . This incorporates a form of meta-learning where past performance (captured by the SSM state influenced by weighted historical gradients) guides the parameter update for the future.
Window-based Multi-snapshot Training:
- Instead of updating parameters based only on adjacent snapshots, DyGSSM uses an overlapping sliding window of snapshots ( $\Delta t$ ).
- Within a window, snapshots are processed sequentially. For each snapshot $t$ in the window, the local, global, and fused embeddings are computed, and the losses $\mathcal{L}_t^{fused}$ and $\mathcal{L}_t^{global}$ are calculated. The SSM state and parameters $\Theta_t$ are updated based on $\mathcal{L}_t^{fused}$ and passed to the next snapshot $t+1$ .
- This sequential processing within the window accumulates information. After processing all snapshots in the window, a final backpropagation step is performed using the aggregated losses across the window: $\mathcal{L}_{fused} = \frac{1}{\Delta t} \sum_{i=t+1}^{t+\Delta t} \mathcal{L}_{i}^{fused}$ and $\mathcal{L}_{global} = \frac{1}{\Delta t} \sum_{i=t}^{t+\Delta t} \mathcal{L}_{i}^{global}$ . This leverages the window to capture longer dependencies than just adjacent snapshots.

Practical Implementation Details and Considerations:

Preprocessing: The biased random walks for global feature extraction are precomputed offline for efficiency during training.
Task: The model is applied to the link prediction task, formulated as a binary classification problem on pairs of nodes.
Architecture: The core components are GCNs, GRUs (or Light GRUs), cross-attention, and the SSM state logic. The GCN and Cross-Attention parameters are updated using the fused loss and SSM, while GRU parameters are updated using the global loss.
Optimization: Standard optimizers like Adam can be used. The training involves iterating through overlapping windows and performing backpropagation based on aggregated losses within the window.
Scalability: The Light GRU variant is proposed to improve scalability by reducing parameters and enabling parallel computation within the sequence processing. Memory usage can still be a concern, especially for large graphs and window sizes, as noted by the OOM errors encountered for some baselines on larger datasets.
Hyperparameters: Key hyperparameters include the number of GCN layers, GRU layers, dimension sizes for embeddings, random walk length and repetitions, window size ( $\Delta t$ ), learning rate, and optimizer settings. The choice of GRU variant (traditional vs. Light) is also an implementation choice.
Evaluation: Performance is measured using standard metrics like Accuracy, AUC, MRR, and Recall@10, with emphasis on ranking metrics (MRR, Recall@10, AP) due to potential class imbalance.

Application:

DyGSSM is directly applicable to real-world dynamic graph problems such as:

Link Prediction: Forecasting future connections in social networks, collaboration graphs (like DBLP), or transaction networks (like Bitcoin). For instance, predicting future collaborations between authors, trust links between users, or communication paths.
Node Classification: Predicting node properties that change over time (though the paper focuses on link prediction, the node embeddings could be used for this).
Recommendation Systems: Modeling user-item interactions in dynamic recommendation graphs, where user preferences and item popularity evolve over time. The learned dynamic node embeddings could be used to improve recommendation accuracy.

The strength of DyGSSM lies in its ability to capture both local and global structural evolution and leverage historical performance signals through the SSM for more informed parameter updates, potentially leading to better performance on dynamic tasks compared to models relying solely on message passing or adjacent snapshot updates. The ablation paper confirms the contribution of each component to the overall performance.