RL-DMF: Dynamic Multi-Graph Fusion

Updated 17 January 2026

RL-DMF is a framework that adaptively fuses multi-graph data using RL-driven feature selection for enhanced spatiotemporal predictions.
It combines dynamic graph construction with reinforcement learning to intelligently mask less critical features, improving prediction accuracy.
Empirical results indicate significant RMSE improvements and clear feature rankings, providing both precise forecasting and interpretability.

Reinforcement Learning-guided Dynamic Multi-Graph Fusion (RL-DMF) is a framework designed to adaptively and interpretably aggregate heterogeneous graph-structured information for spatiotemporal prediction tasks. RL-DMF integrates reinforcement learning (RL) for feature selection and ranking with dynamic multi-graph attention fusion, offering both predictive accuracy and post hoc interpretability, especially in scenarios such as evacuation traffic forecasting (Rafi et al., 10 Jan 2026).

1. Architecture and Workflow

RL-DMF comprises two tightly coupled modules: (i) an RL-based intelligent feature selection and ranking module (RL-IFSR), and (ii) a dynamic multi-graph fusion module (DMF). The workflow proceeds as follows:

Inputs:
- Temporal feature tensors $\tilde X_\mathrm{temp} \in \mathbb{R}^{N_t \times l \times F_t}$ representing the time series of each detector over the previous $l$ time steps.
- Static spatial feature tensors $\tilde X_\mathrm{spatial} \in \mathbb{R}^{N_t \times F_s}$ for each detector.
RL-IFSR: Observes the state $s = \left[\mathrm{mean}_{b,t}(X_\mathrm{temp}) \;\Vert\; \mathrm{mean}_b(X_\mathrm{spatial})\right] \in \mathbb{R}^{F_t+F_s}$ , selects a single feature index $a \in \{0, \dots, F_t + F_s-1\}$ for masking, and receives reward $r = -\mathcal{L}_\mathrm{pred}(\hat y, y)$ (negative prediction loss).
Feature Masking: The selected feature is zeroed in the current batch via binary masks $m_\mathrm{temp}$ and $m_\mathrm{spatial}$ .
Dynamic Multi-Graph Construction & Fusion: At each time step, two dynamic graphs are built to represent different spatial relations: a distance-based graph $G_t^d$ and a travel-time-based graph $G_t^{tt}$ . Node features, after RL-masking, are processed by individual single-layer GCNs, yielding $Z_t^d$ and $Z_t^{tt}$ , which are then fused by node-wise attention into a single $Z_t^\mathrm{fused}$ .
Temporal Modeling and Prediction: The fused node embeddings are passed through an LSTM and a linear prediction layer to yield final predictions $\hat y$ .
Training: At each mini-batch iteration, RL-driven masking and feature selection is performed, DMF computes predictions, MSE loss is backpropagated, and the RL module receives the negative loss as reward to update its policy.

This joint optimization encourages both robust spatiotemporal modeling and interpretability by revealing which features are deemed non-critical by the RL agent (Rafi et al., 10 Jan 2026).

2. Graph Construction and Fusion Mechanism

At the core of DMF are multiple graph representations of the sensor network:

Graph Construction: For every time $t$ $t$ , let $V_t$ $V_{t}$ denote the set of currently active detectors. Two graphs are instantiated:
- $G_t^d = (V_t, E_t^d, A_t^d)$ , with $A_t^d[i,j] = d(i,j)$ for edge $(i,j) \in E_t^d$ , representing pairwise geographic distances.
- $G_t^{tt} = (V_t, E_t^{tt}, A_t^{tt})$ , with $A_t^{tt}[i,j] = tt_t(i,j) = \frac{d(i,j)}{\frac{v_t(i) + v_t(j)}{2}}$ for edge $(i,j) \in E_t^{tt}$ , where $tt_t(i,j)$ encodes real-time travel-time (distance over average velocity).

Both adjacency matrices are min–max normalized to $[0,1]$ and symmetrically normalized: $\tilde A_t^k = D_k^{-1/2} A_t^k D_k^{-1/2}$ for downstream GCN use.

GCN and Attention Fusion: Node features $H_t \in \mathbb{R}^{N_t \times (F_t+F_s)}$ are updated via

$Z_t^k = \mathrm{ReLU}\left(\tilde A_t^k\,H_t\,W^k\right), \quad k \in \{d, tt\}$

The two GCN output tensors are stacked, and learnable attention vectors $w^k$ compute node-specific fusion weights,

$\alpha_{t,i}^k = \frac{ \exp( Z^k_{t,i} \cdot w^k ) }{ \sum_{k'} \exp( Z^{k'}_{t,i} \cdot w^{k'} ) }$

The final node embedding is

$Z_{t,i}^\mathrm{fused} = \sum_k \alpha_{t,i}^k\, Z_{t,i}^k$

capturing both static and dynamic spatial dependencies (Rafi et al., 10 Jan 2026).

3. RL-Based Intelligent Feature Selection and Ranking (RL-IFSR)

RL-IFSR uses a Double DQN (DDQN) to learn a dynamic feature masking policy:

State space: Compact feature vector summarizing mean temporal and spatial features for the current batch.
Action space: Indexing over all $F_t + F_s$ possible features; each action masks a single feature.
Reward: Negative prediction MSE, thus incentivizing policies that retain informative features.
Policy training: DDQN with prioritized replay and $\epsilon$ -greedy exploration:

$y = r + \gamma Q\left(s', \arg\max_{a'} Q(s', a'; \theta); \theta^- \right)$

$\mathcal{L}_\mathrm{DDQN} = \left( y - Q(s,a;\theta) \right)^2$

Over training, features more often masked are inferred to be less critical for prediction, providing a means to rank feature importance (Rafi et al., 10 Jan 2026).

4. Training Objectives and Empirical Performance

The RL-DMF architecture is optimized via two objectives:

Traffic Prediction Loss:

$\mathcal{L}_\mathrm{pred} = \frac{1}{N_t p} \sum_{i=1}^{N_t}\sum_{h=1}^{p} \left( \hat y_i^{(h)} - y_i^{(h)} \right)^2$

(MSE over detectors and prediction steps)

DDQN Loss: As above.

The modules are trained jointly, alternating between end-to-end DMF backpropagation and RL-IFSR policy updates.

Empirical results (Milton, Hurricane 2024):

RL-DMF achieves a 1-hour RMSE of 293.9 (95% accuracy) and a 6-hour RMSE of 426.4 (90% accuracy), outperforming LSTM, CNN-LSTM, and strong GCN-LSTM baselines, as well as single-graph DDQN-masked variants. Ablation studies show that attention-based multi-graph fusion reduces RMSE by 5–9 points compared to single-graph models, and RL-driven masking yields a further ∼22-point gain (Rafi et al., 10 Jan 2026).

Model	RMSE	MAE	MAPE (%)	$R^2$
LSTM	481.0	324.2	29.6	0.87
CNN-LSTM	537.1	357.5	28.5	0.84
Static GCN-LSTM	480.1	308.7	25.6	0.87
Dynamic-Dist GCN-LSTM	482.6	313.9	26.0	0.87
Dynamic-TT GCN-LSTM	474.9	310.2	26.2	0.88
RL-DMF (proposed)	426.4	281.1	25.2	0.90

5. Interpretability and Feature Importance

RL-IFSR offers a straightforward interpretability mechanism: the masking frequency $f_j$ of each feature across training reflects its predictive importance. Features that are rarely masked are critical; those often masked are redundant or less useful.

Empirical ranking reveals the most important predictors for evacuation traffic forecasting to be:

Recent traffic volumes,
Previous-day and previous-period mean/variance,
Day-of-week, hour-of-day, time-to-landfall indicators,
Cumulative evacuated population.

Variables such as incident flags, lane closures, and road-type encodings exhibit higher masking frequencies, indicating lower relevance. This dynamic feature ranking aligns with domain expectations in evacuation scenarios (Rafi et al., 10 Jan 2026).

6. Implementation Details and Considerations

Graph backbones: Single-layer GCN per graph with hidden size $H=64$ .
Temporal modeling: LSTM hidden size $H=64$ , sequence length $l=6$ hours, prediction horizon $p=6$ hours.
Optimization: Adam (DMF, lr= $10^{-3}$ , batch=32); DDQN replay buffer of 20,000, batch=64, $\gamma=0.99$ , $\epsilon$ -decay 1.0 $\to$ 0.05 at rate 0.995 per epoch.
Graph dynamics: The detector/node set $V_t$ and edge sets $E_t^k$ at time $t$ reflect only online sensors with edges between currently active pairs. Adjacency matrices are normalized for GCN input at each time step.

This framework provides a template for general spatiotemporal multi-graph prediction tasks demanding both adaptability (via dynamic graph fusion) and explainability (via RL-based masking), with validated gains under realistic evacuation conditions (Rafi et al., 10 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reinforcement Learning-guided Dynamic Multi-Graph Fusion (RL-DMF).

RL-DMF: Dynamic Multi-Graph Fusion

1. Architecture and Workflow

2. Graph Construction and Fusion Mechanism

3. RL-Based Intelligent Feature Selection and Ranking (RL-IFSR)

4. Training Objectives and Empirical Performance

5. Interpretability and Feature Importance

6. Implementation Details and Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

RL-DMF: Dynamic Multi-Graph Fusion

1. Architecture and Workflow

2. Graph Construction and Fusion Mechanism

3. RL-Based Intelligent Feature Selection and Ranking (RL-IFSR)

4. Training Objectives and Empirical Performance

5. Interpretability and Feature Importance

6. Implementation Details and Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research