Graph Recurrent Neural Networks

Updated 8 March 2026

Graph Recurrent Neural Networks (GRNNs) are neural architectures that interleave graph-based message passing with recurrent updates to capture spatial and temporal dependencies.
They integrate gating mechanisms like LSTM/GRU to mitigate over-smoothing and preserve discriminative features across deep, dynamic graph structures.
GRNNs are applied in tasks such as graph-level prediction, dynamic link prediction, and video analysis, demonstrating state-of-the-art performance in various real-world scenarios.

Graph Recurrent Neural Networks (GRNNs) are a broad class of neural architectures that interleave graph-based message passing with explicit state recurrence, enabling the modeling of both spatial dependencies defined by a graph structure and temporal or sequential relationships. GRNNs generalize classical recurrent neural networks (RNNs) and graph neural networks (GNNs), augmenting the capacity for long-range relational reasoning, robust temporal or iterative propagation, and principled handling of dynamic or multi-relational structures.

1. Fundamental GRNN Architectures and Recurrence

A GRNN typically maintains node-wise hidden states $h_v^{(t)}$ that evolve through explicit recurrence. In each iteration or time step, hidden states are updated as a function of previous node states and the aggregation of the states of their graph neighbors. This principle underlies classical formulations in both the static and dynamic (or temporally evolving) graph settings.

Generalized GRNN Update

At iteration $t$ for node $v$ :

$h_v^{(t)} = \text{RNNCell}\left( h_v^{(t-1)}, \mathrm{AGG}\left( \{ h_u^{(t-1)} : u\in \mathcal{N}(v)\} \right) \right)$

where $\mathrm{AGG}$ is a permutation-invariant function (sum, mean, max, etc.) over the neighbor states, and $\text{RNNCell}$ can be a vanilla RNN, GRU, LSTM, or a more specialized gating unit (Huang et al., 2019, Song, 2019, Ruiz et al., 2020, Li et al., 2019).

GRNNs may be stacked (multiple layers), applied iteratively (over graph-structured signals), or aligned in time to process sequences of graph snapshots or event streams (Yan et al., 2020, Hajiramezanali et al., 2019, Chen et al., 2023). The update may incorporate edge features, directional information, and multi-relational graphs (Ioannidis et al., 2018).

2. Gating, Memory, and Depth: Overcoming GNN Limitations

Classical GNNs suffer from over-smoothing and the dilution of node features with increasing depth. GRNNs address this by introducing explicit gating—borrowed from RNNs—at each layer or iteration, allowing dynamic control over feature propagation and memory retention.

LSTM/GRU Gating

Typical LSTM-based updates within a GRNN for node $v$ include: $\begin{aligned} i_v^{(t)} &= \sigma\left( W_i m_v^{(t)} + U_i h_v^{(t-1)} + b_i \right) \ f_v^{(t)} &= \sigma\left( W_f m_v^{(t)} + U_f h_v^{(t-1)} + b_f \right) \ o_v^{(t)} &= \sigma\left( W_o m_v^{(t)} + U_o h_v^{(t-1)} + b_o \right) \ \tilde{c}_v^{(t)} &= \tanh\left( W_c m_v^{(t)} + U_c h_v^{(t-1)} + b_c \right) \ c_v^{(t)} &= f_v^{(t)} \odot c_v^{(t-1)} + i_v^{(t)} \odot \tilde{c}_v^{(t)} \ h_v^{(t)} &= o_v^{(t)} \odot \tanh(c_v^{(t)}) \end{aligned}$ where $m_v^{(t)}$ is an aggregation (possibly attention-weighted) of neighbor features (Song, 2019, Li et al., 2019).

Advantages

Mitigation of over-smoothing: Dynamic gates preserve discriminative signals over deep recurrences (Li et al., 2019, Huang et al., 2019).
Noise suppression: Gates filter irrelevant or noisy signals from distant neighbors.
Deep architectures: RGNNs empirically support very deep stacks (e.g., 10+ layers), outperforming residual-based GNNs (Huang et al., 2019).
Temporal memory: LSTM/GRU cells enable the retention of past information crucial for temporal and dynamic graph settings (Hajiramezanali et al., 2019, Chen et al., 2023).

3. Variants: Dynamic, Multi-Relational, and Stochastic GRNNs

Dynamic Graphs and Temporal Aggregation

Continuous-Time Dynamic Graphs (CTDGs): States of nodes are updated upon arrival of temporally ordered events. GRNNs process such sequences via event-based recurrence, with BPTT (backpropagation-through-time) for training (Bravo et al., 2024, Chen et al., 2023).
Temporal Revision Mechanisms: E.g., RTRGN (Chen et al., 2023) maintain nodewise hidden states integrating all historical neighbors via node-specific RNNs, providing enhanced expressiveness beyond standard temporal GNNs.

Multi-Relational GRNNs

GRNNs can be extended to multi-relational settings by maintaining separate diffusion operators per relation type and adaptively mixing them via learnable weights (Ioannidis et al., 2018). For $I$ relation types with adjacency tensors $S^{(i)}$ ,

$H_{n,i,c}^{(\ell)} = \sum_{m \in \mathcal N_n^{(i)}} S_{nm}^{(i)} Z_{m,i,c}^{(\ell-1)}$

These are linearly mixed across relations and channels, leading to highly flexible, scalable aggregation schemes.

Stochastic Latent State Extensions

Variational GRNNs/VGRNN: Node-level embeddings are further augmented by stochastic latent variables $Z^{(t)}$ , inferred via variational inference (ELBO). These may capture uncertainty in graph evolution and allow modeling of dynamic, multimodal behavior (Hajiramezanali et al., 2019, Yan et al., 2020).
Semi-implicit posterior and KL regularization: Enhanced expressivity and robustness to posterior collapse are obtained with hierarchical noise injection and batch-norm based lower bounds on KL divergence (Yan et al., 2020).

4. Theory: Expressiveness and Computation

Logical and Automata-Theoretic Characterization

GRNNs with real-valued computation match the expressive power of infinitary graded modal logic ( $\omega$ -GML); float-bounded GRNNs correspond to rule-based modal logic with counting (GMSC) (Ahvonen et al., 2024). Both characterizations coincide over MSO-definable properties, and GRNNs are further equivalent to (bounded) counting message-passing automata (CMPA) in distributed computing. This situates GRNNs within classical distributed automata and modal logic frameworks.

Equivalence to Arithmetic Circuits

Recurrent GNNs are precisely as expressive as recurrent arithmetic circuits over the reals, up to encoding differences (Barlag et al., 5 Mar 2026). Any GRNN can be simulated by a recurrent arithmetic circuit with "memory gates," and vice-versa, using formal translations of aggregation and combination functions. This delineates an exact computational boundary for GRNNs, dependent on the complexity (depth, circuit class) of their component operators.

5. Applications and Empirical Evaluation

GRNNs have demonstrated state-of-the-art or competitive performance across a spectrum of tasks:

Graph-level prediction: GraphLSTM with sequence-sampled nodes, Gumbel-Softmax random walks, and neighborhood-aware LSTM achieves strong accuracy and fast convergence on chemical and bioinformatics datasets (Jin et al., 2018).
Text classification: ReGNN (layerwise LSTM gating, global nodes) outperforms both sequential (LSTM, Transformer) and standard graph (GCN, GraphSAGE) models on a range of single- and multi-label text benchmarks. LSTM-style gating is critical for resisting over-smoothing and maintaining representational discrimination at depth (Li et al., 2019).
Dynamic link prediction: VGRNN/SI-VGRNN and SGRNN variants lead dynamic link prediction tasks on evolving networks, via hierarchical stochastic state modeling (Hajiramezanali et al., 2019, Yan et al., 2020).
Video analysis: Space-time GRNNs (RSTG) and GNN+RNN hybrids yield superior performance in video action recognition and video instance segmentation, leveraging explicit recurrence for temporal memory and spatial message passing for object interactions (Nicolicioiu et al., 2019, Johnander et al., 2020).
Algorithm learning and extrapolation: Skip connections, state regularization, and edge-convolutions in recurrent frameworks allow training on small graphs and deployment on much larger instances without degradation, essential for tasks such as pathfinding and prefix-sum on arbitrary-sized graphs (Grötschla et al., 2022).
Semi-supervised classification: Multi-relational GRNNs, via learnable diffusion and regularization, outperform standard GCNs on node classification benchmarks and exhibit robustness to noisy features and graph edges (Ioannidis et al., 2018).

6. Model Properties and Limitations

Invariances and Stability

Permutation equivariance: GRNNs preserve node label invariance given isomorphic graphs, provided aggregation and update functions are symmetric (Ruiz et al., 2020).
Stability to graph perturbations: Output changes scale polynomially in time and with the size of perturbation, with higher-order stability bounds for fully gated architectures.

Practical concerns

Parameter efficiency: Advanced architectures (e.g., FGRNN) achieve the benefits of stability with dramatically fewer parameters than naive gated structures, by using weighted residual connections (Kadambari et al., 2020).
BPTT truncation gap: On long event sequences (CTDGs), truncated backpropagation-through-time can severely limit the ability to capture long-range dependencies, resulting in measurable performance gaps (Bravo et al., 2024). Remedies involve adaptive truncation, memory-augmented models, or unbiased online gradient approximations.

Open Challenges

Dynamic graph support: Most GRNNs assume static or slowly evolving graphs; fully dynamic structures introduce new complexity.
Scalability in multi-relational or high-degree settings: Memory and computational cost can increase rapidly; some solutions introduce sparse mixing, attention, or low-rank regularization.
Combine gating and attention: The interplay between spatial gating, attention, and recurrent memory is a promising area for further exploration.

7. Taxonomy and Representative Models

Model/Component	Setting	Distinctive Feature(s)
RGNN (Huang et al., 2019)	Static graphs	Gating via GRU/LSTM across layers
ReGNN (Li et al., 2019)	Text graphs	Layerwise LSTM, global node gating
FGRNN (Kadambari et al., 2020)	Signals on graphs	Weighted residuals for stability
RSTG (Nicolicioiu et al., 2019)	Video	Interleaved space/time recurrence
SGRNN/VGRNN (Yan et al., 2020, Hajiramezanali et al., 2019)	Dynamic graphs	Stochastic latent states with VI
Multi-relational GRNN (Ioannidis et al., 2018)	Multi-layer graphs	Learnable mixing of relations
RTRGN (Chen et al., 2023)	Temporal graphs	Recurrent temporal neighbor revision
R-GNN (Huang et al., 2021)	Online forums	Post-wise GCN + temporal GRU

References

(Jin et al., 2018) Learning Graph-Level Representations with Recurrent Neural Networks
(Ioannidis et al., 2018) A Recurrent Graph Neural Network for Multi-Relational Data
(Huang et al., 2019) Residual or Gate? Towards Deeper Graph Neural Networks for Inductive Graph Representation Learning
(Song, 2019) Tackling Graphical NLP problems with Graph Recurrent Networks
(Hajiramezanali et al., 2019) Variational Graph Recurrent Neural Networks
(Li et al., 2019) Recursive Graphical Neural Networks for Text Classification
(Kadambari et al., 2020) Fast Graph Convolutional Recurrent Neural Networks
(Ruiz et al., 2020) Gated Graph Recurrent Neural Networks
(Yan et al., 2020) Stochastic Graph Recurrent Neural Network
(Johnander et al., 2020) Learning Video Instance Segmentation with Recurrent Graph Neural Networks
(Huang et al., 2021) Recurrent Graph Neural Networks for Rumor Detection in Online Forums
(Grötschla et al., 2022) Learning Graph Algorithms With Recurrent Graph Neural Networks
(Chen et al., 2023) Recurrent Temporal Revision Graph Networks
(Ahvonen et al., 2024) Logical Characterizations of Recurrent Graph Neural Networks with Reals and Floats
(Bravo et al., 2024) Mind the truncation gap: challenges of learning on dynamic graphs with recurrent architectures
(Barlag et al., 5 Mar 2026) Recurrent Graph Neural Networks and Arithmetic Circuits