Dual-Stream GNN-LSTM Architecture

Updated 23 November 2025

Dual-stream GNN-LSTM networks are architectures combining graph neural networks with LSTM to jointly model spatial and temporal dependencies.
They leverage coordinated streams—parallel, serial, and dual-recurrence—to enhance performance in dynamic graph modeling, forecasting, and interaction prediction.
Empirical results demonstrate improved accuracy and robustness across domains like stock prediction, network performance, and drug interaction analysis.

A Dual-Stream GNN-LSTM Network is a neural architecture designed to jointly model the spatial relationships encoded by graphs and the temporal or hierarchical dependencies that arise in structured, sequential, or multi-resolution data. This metamodel integrates a Graph Neural Network (GNN) stream and a Long Short-Term Memory (LSTM) stream. Various implementations exist, encompassing parallel, hierarchical, and dual-recurrent variants to address tasks such as structured entity interaction, dynamic graph modeling, network performance prediction, and multivariate time series forecasting.

1. Architectural Principles and Taxonomy

Dual-stream GNN-LSTM architectures are primarily characterized by two coordinated computational streams:

GNN Stream: Performs message-passing or convolution over static or dynamic graph structure, generating node, edge, or graph-level embeddings that capture topology and local substructure.
LSTM Stream: Processes sequences—either of node/graph representations across time or across hierarchical graph resolutions—to extract long-range temporal, multiscale, or interaction dependencies not easily encoded by GNN layers alone.

A canonical taxonomy spans:

Parallel streams, where GNN and LSTM operate in parallel on distinct input modalities, followed by feature fusion (e.g., hybrid stock price prediction (Sonani et al., 19 Feb 2025)).
Serial streams, where GNN output at each step forms the LSTM input sequence (e.g., dynamic graph convolution (Manessi et al., 2017)).
Dual-recurrence, in which both streams are unrolled and possibly interact at multiple resolutions (e.g., MR-GNN’s S-LSTM and I-LSTM (Xu et al., 2019)).

2. Core Architectural Implementations

2.1 Multi-Resolution, Dual LSTM GNN (MR-GNN)

MR-GNN (Xu et al., 2019) is an end-to-end network for predicting interactions between two graphs, combining multi-resolution GNN layers with a dual LSTM architecture:

Multi-resolution GNN: For each graph $G_x$ and $G_y$ , R weighted graph-convolutional and pooling layers extract features $f_i^{(t)}$ at resolution $t=0,\dots,R$ . A graph-gather operation summarizes node embeddings into graph-state vectors $g_x^{(t)}$ and $g_y^{(t)}$ .
Dual LSTMs:
- S-LSTM (Summary-LSTM): Independently processes the multiresolution graph-state sequence $\{g^{(0)},\dots,g^{(R)}\}$ for each graph, yielding a global summary $s^{(R)}$ .
- I-LSTM (Interaction-LSTM): Jointly processes the concatenated pairwise states $[g_x^{(t)}; g_y^{(t)}]$ across scales, yielding an aggregated interaction state $h^{(R)}$ .
Fusion: The final prediction is made by concatenating the summary/fusion features and passing them through an MLP, with cross-entropy loss.

2.2 Parallel Hybrid LSTM-GNN for Multivariate Forecasting

In the hybrid LSTM-GNN model for stock price prediction (Sonani et al., 19 Feb 2025):

GNN Stream: A graph is constructed based on correlated time series (e.g., using Pearson or Apriori analysis). Node features are propagated with GCN layers, producing a relational embedding $h_\mathrm{GNN}$ .
LSTM Stream: An LSTM models the temporal sequence for each node (e.g., stock), producing a temporal embedding $h_\mathrm{LSTM}$ .
Fusion: The temporal and relational embeddings are concatenated into $h_\mathrm{fuse}$ , input to dense layers for scalar regression, with MSE loss and expanding-window evaluation.

2.3 Spatiotemporal Message-Passing with LSTM Cells

RouteNet-Fermi (Verma et al., 7 Dec 2024) generalizes GNN message-passing by integrating RNN, LSTM, or GRU cells at each node:

Each node (flow, queue, link) updates its state at each message-passing round via a recurrent cell:

$h_n^{(t+1)} = \mathrm{LSTMCell}(h_n^{(t)}, m_n^{(t)}, x_n)$

Aggregated messages $m_n^{(t)}$ are computed from neighbors via edge-MLPs, and the LSTM cell captures both local message dependencies and temporal state.
After $T$ message-passing steps, the final hidden states yield predictions (e.g., delay, jitter, loss) via small MLPs.

3. Mathematical Formulations

A dual-stream GNN-LSTM is underpinned by the following equations (as instantiated in the cited works):

Weighted GNN layer (Xu et al., 2019):

$\tilde f_i^{(t+1)} = f_i^{(t)} \Phi_{d_i}^{(t)} + \sum_{j\in \mathcal N_i} f_j^{(t)} \Psi_{d_i}^{(t)} + b_{d_i}^{(t)}$

$f_i^{(t+1)} = \mathrm{GP}\big(\tilde f_i^{(t+1)}, \{\tilde f_j^{(t+1)}: j\in \mathcal N_i\}\big)$

LSTM update (Xu et al., 2019, Verma et al., 7 Dec 2024):

$\begin{aligned} i_t &= \sigma(W_i x_t + U_i h_{t-1} + b_i) \ f_t &= \sigma(W_f x_t + U_f h_{t-1} + b_f) \ o_t &= \sigma(W_o x_t + U_o h_{t-1} + b_o) \ \tilde c_t &= \tanh(W_c x_t + U_c h_{t-1} + b_c) \ c_t &= f_t \odot c_{t-1} + i_t \odot \tilde c_t \ h_t &= o_t \odot \tanh(c_t) \end{aligned}$

Message-passing RNN cell (Verma et al., 7 Dec 2024):

$h_v^{(t+1)} = \mathrm{LSTMCell}(h_v^{(t)}, m_v^{(t)})$

where $m_v^{(t)}$ is a message aggregated from neighbors.

4. Applications and Empirical Results

Dual-stream GNN-LSTM architectures have demonstrated gains in both prediction accuracy and robustness across diverse domains, as summarized below.

Domain	Key Architecture	Main Metric(s)	Results Highlights
Drug/chemical CCI/DDI	MR-GNN dual LSTM–GNN (Xu et al., 2019)	AUC, F1	+2.5% accuracy, +11.8% AUC over DeepCCI
Stock prediction	Parallel LSTM–GNN (Sonani et al., 19 Feb 2025)	MSE	0.00144 vs. 0.00161 for LSTM-only (–10.6%)
Network perf.	Msg.-passing LSTM-GNN (Verma et al., 7 Dec 2024)	MAPE, MAE	LSTM MAPE 0.33–2.21% (vs. 0.69–5.39% RNN)
Dynamic graphs	GCN→LSTM (Manessi et al., 2017)	Acc, F1	70% vs. ~55–62% (GCN/LSTM/FC baselines)

In each case, ablation studies confirm the complementary value of both streams: removal of either GNN or LSTM components substantially degrades performance. For example, in MR-GNN, eliminating the interaction LSTM reduces Macro-F1 from 93.5% to 92.8% (Xu et al., 2019); in RouteNet-Fermi, LSTM message-passing consistently outperforms simple RNNs as network scale and traffic burstiness increase (Verma et al., 7 Dec 2024).

5. Training Paradigms and Fusion Mechanisms

A range of training regimes and fusion methods are reported:

Fusion by concatenation: Most works concatenate GNN and LSTM embeddings (either node- or graph-level) before an MLP or dense head (Sonani et al., 19 Feb 2025, Xu et al., 2019).
Multi-task joint loss: Networks may be trained with multi-objective losses across several prediction tasks (Verma et al., 7 Dec 2024).
Expanding window: In time series applications, expanding-window validation and continual retraining are used to adapt to non-stationary data (Sonani et al., 19 Feb 2025).
BPTT through GNN: For temporally unrolled models (e.g., dynamic GCN-LSTM), full backpropagation through both streams and through time is performed (Manessi et al., 2017).

6. Design Considerations and Limitations

Design choices are highly dependent on data domain and modeling objectives:

Stream interaction: In some models (MR-GNN), the streams interact hierarchically across graph resolutions and across entities (Xu et al., 2019); in others (hybrid forecasting), LSTM and GNN are independent and only fused at the output (Sonani et al., 19 Feb 2025).
Cell type selection: LSTM cells are preferred over simple RNNs or GRUs for scenarios requiring long-range temporal memory or highly bursty input (Verma et al., 7 Dec 2024), though GRUs may be favored under strong resource constraints.
Graph construction: The efficacy of GNN streams is sensitive to the underlying graph topology, with domain-specific thresholds (e.g., correlation/lift in financial graphs (Sonani et al., 19 Feb 2025)) affecting inter-node relational expressivity.
Computational cost: Dual-stream architectures incur added parameters and computation, requiring careful tuning and sometimes incremental or resource-adaptive deployment (Sonani et al., 19 Feb 2025).

Known limitations include sensitivity to hyperparameters, reliance on informative graph construction, and risk of overfitting under continual-expanding retraining paradigms.

7. Empirical Guidelines and Future Implications

Empirical evidence indicates that:

Dual-stream GNN-LSTM networks produce state-of-the-art results when both spatial and temporal/hierarchical dependencies matter.
LSTM-equipped GNNs show generalization benefits in domains with nontrivial sequential, bursty, or multiscale dynamics, with ablation indicating that removal of either stream leads to significant loss of predictive performance.
Application-specific hyperparameter tuning, targeted ablation analysis, and careful attention to graph construction heuristics are recommended to maximize performance and robustness.

A plausible implication is that as graph-structured and sequential data become more prevalent in real-world applications, hybrid dual-stream GNN-LSTM architectures will remain an essential methodological foundation for a broad class of forecasting and relational modeling tasks (Xu et al., 2019, Sonani et al., 19 Feb 2025, Verma et al., 7 Dec 2024, Manessi et al., 2017).