- The paper introduces a multi-view model that fuses local GCN and global GRU embeddings to capture both short- and long-range graph structures.
- It employs a State Space Model with HiPPO-based gradient updates to effectively handle temporal dependencies in dynamic graph snapshots.
- The window-based training mechanism over successive snapshots improves scalability and models long-term dependencies for link prediction.
This paper introduces DyGSSM (Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update) (2505.09017), a novel approach for dynamic graph representation learning, specifically applied to the task of link prediction. The authors address limitations in existing dynamic graph methods, which often struggle to simultaneously capture both local and global structural information within graph snapshots and fail to effectively manage temporal dependencies during model parameter updates.
DyGSSM tackles these challenges by proposing a multi-view architecture that combines local and global feature extraction for each graph snapshot and integrates a State Space Model (SSM) with a HiPPO-based gradient update mechanism.
Here's a breakdown of the DyGSSM implementation and application:
- Multi-view Node Embedding per Snapshot:
- Local View: A standard Graph Convolutional Network (GCN) is used to capture local structural information for each node in a snapshot Gt. The GCN aggregates information from immediate neighbors through message passing. The implementation follows a layer update style similar to WinGNN (Mou et al., 2023), using shared trainable weight matrices and skip connections across layers. This produces local node embeddings Xtlocal.
- Global View: To capture broader graph structure beyond immediate neighbors, a biased Random Walk (RW) coupled with a Gated Recurrent Unit (GRU) is employed. For each node, biased random walks (tuned towards exploring distant nodes) are performed to generate node sequences. A fixed number of the most frequent nodes from these sequences are selected. These sequences are then processed by a two-layer GRU (or a lightweight variant, Light GRU) to produce global node embeddings Xtglobal. Weights for the GRU layers are shared across snapshots. The use of Light GRU (Feng et al., 2 Oct 2024) is an implementation consideration for scalability, offering reduced parameters and parallelism.
- Integration: The local (Xtlocal) and global (Xtglobal) embeddings for the same snapshot are fused using a cross-attention mechanism. The local embeddings serve as the query, and global embeddings as key and value. This attention mechanism learns to weigh the importance of global information relative to local information when creating the final fused embedding Xtfused. A single-head attention is used for parameter efficiency.
- SSM-based Gradient Update:
- The core mechanism for handling temporal dependencies during parameter updates involves an SSM. The model calculates a prediction loss (Ltfused) for the fused embeddings Xtfused at the current snapshot t (e.g., binary cross-entropy for link prediction). A separate loss (Ltglobal) is calculated for the global embeddings Xtglobal to update the GRU parameters.
- For updating the GCN and cross-attention parameters, the gradient of Ltfused with respect to the model parameters (δΘtδLtfused) is computed.
- A dynamic weight weightt=Ltfused+ϵ1 is calculated, inversely proportional to the prediction loss. This ensures that snapshots where the model performs poorly contribute less to the state update.
- An SSM state st is maintained and updated based on the previous state st−1, the current gradient G^t (derived from δΘtδLtfused), and the dynamic weight weightt: st=K^st−1+G^tweightt. The matrix K^ is initialized using the HiPPO algorithm (Gu et al., 2022) to effectively preserve and project historical information.
- The model parameters Θt+1 for the next snapshot are updated based on the current parameters Θt, the current gradient δΘtδLtfused, and the SSM state st: Θt+1←(δΘtδLtfused)+Θt×st. This incorporates a form of meta-learning where past performance (captured by the SSM state influenced by weighted historical gradients) guides the parameter update for the future.
- Window-based Multi-snapshot Training:
- Instead of updating parameters based only on adjacent snapshots, DyGSSM uses an overlapping sliding window of snapshots (Δt).
- Within a window, snapshots are processed sequentially. For each snapshot t in the window, the local, global, and fused embeddings are computed, and the losses Ltfused and Ltglobal are calculated. The SSM state and parameters Θt are updated based on Ltfused and passed to the next snapshot t+1.
- This sequential processing within the window accumulates information. After processing all snapshots in the window, a final backpropagation step is performed using the aggregated losses across the window: Lfused=Δt1i=t+1∑t+ΔtLifused and Lglobal=Δt1i=t∑t+ΔtLiglobal. This leverages the window to capture longer dependencies than just adjacent snapshots.
Practical Implementation Details and Considerations:
- Preprocessing: The biased random walks for global feature extraction are precomputed offline for efficiency during training.
- Task: The model is applied to the link prediction task, formulated as a binary classification problem on pairs of nodes.
- Architecture: The core components are GCNs, GRUs (or Light GRUs), cross-attention, and the SSM state logic. The GCN and Cross-Attention parameters are updated using the fused loss and SSM, while GRU parameters are updated using the global loss.
- Optimization: Standard optimizers like Adam can be used. The training involves iterating through overlapping windows and performing backpropagation based on aggregated losses within the window.
- Scalability: The Light GRU variant is proposed to improve scalability by reducing parameters and enabling parallel computation within the sequence processing. Memory usage can still be a concern, especially for large graphs and window sizes, as noted by the OOM errors encountered for some baselines on larger datasets.
- Hyperparameters: Key hyperparameters include the number of GCN layers, GRU layers, dimension sizes for embeddings, random walk length and repetitions, window size (Δt), learning rate, and optimizer settings. The choice of GRU variant (traditional vs. Light) is also an implementation choice.
- Evaluation: Performance is measured using standard metrics like Accuracy, AUC, MRR, and Recall@10, with emphasis on ranking metrics (MRR, Recall@10, AP) due to potential class imbalance.
Application:
DyGSSM is directly applicable to real-world dynamic graph problems such as:
- Link Prediction: Forecasting future connections in social networks, collaboration graphs (like DBLP), or transaction networks (like Bitcoin). For instance, predicting future collaborations between authors, trust links between users, or communication paths.
- Node Classification: Predicting node properties that change over time (though the paper focuses on link prediction, the node embeddings could be used for this).
- Recommendation Systems: Modeling user-item interactions in dynamic recommendation graphs, where user preferences and item popularity evolve over time. The learned dynamic node embeddings could be used to improve recommendation accuracy.
The strength of DyGSSM lies in its ability to capture both local and global structural evolution and leverage historical performance signals through the SSM for more informed parameter updates, potentially leading to better performance on dynamic tasks compared to models relying solely on message passing or adjacent snapshot updates. The ablation paper confirms the contribution of each component to the overall performance.