Papers
Topics
Authors
Recent
Search
2000 character limit reached

RL-DMF: Dynamic Multi-Graph Fusion

Updated 17 January 2026
  • RL-DMF is a framework that adaptively fuses multi-graph data using RL-driven feature selection for enhanced spatiotemporal predictions.
  • It combines dynamic graph construction with reinforcement learning to intelligently mask less critical features, improving prediction accuracy.
  • Empirical results indicate significant RMSE improvements and clear feature rankings, providing both precise forecasting and interpretability.

Reinforcement Learning-guided Dynamic Multi-Graph Fusion (RL-DMF) is a framework designed to adaptively and interpretably aggregate heterogeneous graph-structured information for spatiotemporal prediction tasks. RL-DMF integrates reinforcement learning (RL) for feature selection and ranking with dynamic multi-graph attention fusion, offering both predictive accuracy and post hoc interpretability, especially in scenarios such as evacuation traffic forecasting (Rafi et al., 10 Jan 2026).

1. Architecture and Workflow

RL-DMF comprises two tightly coupled modules: (i) an RL-based intelligent feature selection and ranking module (RL-IFSR), and (ii) a dynamic multi-graph fusion module (DMF). The workflow proceeds as follows:

  • Inputs:
    • Temporal feature tensors X~tempRNt×l×Ft\tilde X_\mathrm{temp} \in \mathbb{R}^{N_t \times l \times F_t} representing the time series of each detector over the previous ll time steps.
    • Static spatial feature tensors X~spatialRNt×Fs\tilde X_\mathrm{spatial} \in \mathbb{R}^{N_t \times F_s} for each detector.
  • RL-IFSR: Observes the state s=[meanb,t(Xtemp)    meanb(Xspatial)]RFt+Fss = \left[\mathrm{mean}_{b,t}(X_\mathrm{temp}) \;\Vert\; \mathrm{mean}_b(X_\mathrm{spatial})\right] \in \mathbb{R}^{F_t+F_s}, selects a single feature index a{0,,Ft+Fs1}a \in \{0, \dots, F_t + F_s-1\} for masking, and receives reward r=Lpred(y^,y)r = -\mathcal{L}_\mathrm{pred}(\hat y, y) (negative prediction loss).
  • Feature Masking: The selected feature is zeroed in the current batch via binary masks mtempm_\mathrm{temp} and mspatialm_\mathrm{spatial}.
  • Dynamic Multi-Graph Construction & Fusion: At each time step, two dynamic graphs are built to represent different spatial relations: a distance-based graph GtdG_t^d and a travel-time-based graph GtttG_t^{tt}. Node features, after RL-masking, are processed by individual single-layer GCNs, yielding ZtdZ_t^d and ZtttZ_t^{tt}, which are then fused by node-wise attention into a single ZtfusedZ_t^\mathrm{fused}.
  • Temporal Modeling and Prediction: The fused node embeddings are passed through an LSTM and a linear prediction layer to yield final predictions y^\hat y.
  • Training: At each mini-batch iteration, RL-driven masking and feature selection is performed, DMF computes predictions, MSE loss is backpropagated, and the RL module receives the negative loss as reward to update its policy.

This joint optimization encourages both robust spatiotemporal modeling and interpretability by revealing which features are deemed non-critical by the RL agent (Rafi et al., 10 Jan 2026).

2. Graph Construction and Fusion Mechanism

At the core of DMF are multiple graph representations of the sensor network:

  • Graph Construction: For every time tt, let VtV_t denote the set of currently active detectors. Two graphs are instantiated:
    • Gtd=(Vt,Etd,Atd)G_t^d = (V_t, E_t^d, A_t^d), with Atd[i,j]=d(i,j)A_t^d[i,j] = d(i,j) for edge (i,j)Etd(i,j) \in E_t^d, representing pairwise geographic distances.
    • Gttt=(Vt,Ettt,Attt)G_t^{tt} = (V_t, E_t^{tt}, A_t^{tt}), with Attt[i,j]=ttt(i,j)=d(i,j)vt(i)+vt(j)2A_t^{tt}[i,j] = tt_t(i,j) = \frac{d(i,j)}{\frac{v_t(i) + v_t(j)}{2}} for edge (i,j)Ettt(i,j) \in E_t^{tt}, where ttt(i,j)tt_t(i,j) encodes real-time travel-time (distance over average velocity).

Both adjacency matrices are min–max normalized to [0,1][0,1] and symmetrically normalized: A~tk=Dk1/2AtkDk1/2\tilde A_t^k = D_k^{-1/2} A_t^k D_k^{-1/2} for downstream GCN use.

  • GCN and Attention Fusion: Node features HtRNt×(Ft+Fs)H_t \in \mathbb{R}^{N_t \times (F_t+F_s)} are updated via

Ztk=ReLU(A~tkHtWk),k{d,tt}Z_t^k = \mathrm{ReLU}\left(\tilde A_t^k\,H_t\,W^k\right), \quad k \in \{d, tt\}

The two GCN output tensors are stacked, and learnable attention vectors wkw^k compute node-specific fusion weights,

αt,ik=exp(Zt,ikwk)kexp(Zt,ikwk)\alpha_{t,i}^k = \frac{ \exp( Z^k_{t,i} \cdot w^k ) }{ \sum_{k'} \exp( Z^{k'}_{t,i} \cdot w^{k'} ) }

The final node embedding is

Zt,ifused=kαt,ikZt,ikZ_{t,i}^\mathrm{fused} = \sum_k \alpha_{t,i}^k\, Z_{t,i}^k

capturing both static and dynamic spatial dependencies (Rafi et al., 10 Jan 2026).

3. RL-Based Intelligent Feature Selection and Ranking (RL-IFSR)

RL-IFSR uses a Double DQN (DDQN) to learn a dynamic feature masking policy:

  • State space: Compact feature vector summarizing mean temporal and spatial features for the current batch.
  • Action space: Indexing over all Ft+FsF_t + F_s possible features; each action masks a single feature.
  • Reward: Negative prediction MSE, thus incentivizing policies that retain informative features.
  • Policy training: DDQN with prioritized replay and ϵ\epsilon-greedy exploration:

y=r+γQ(s,argmaxaQ(s,a;θ);θ)y = r + \gamma Q\left(s', \arg\max_{a'} Q(s', a'; \theta); \theta^- \right)

LDDQN=(yQ(s,a;θ))2\mathcal{L}_\mathrm{DDQN} = \left( y - Q(s,a;\theta) \right)^2

Over training, features more often masked are inferred to be less critical for prediction, providing a means to rank feature importance (Rafi et al., 10 Jan 2026).

4. Training Objectives and Empirical Performance

The RL-DMF architecture is optimized via two objectives:

  • Traffic Prediction Loss:

Lpred=1Ntpi=1Nth=1p(y^i(h)yi(h))2\mathcal{L}_\mathrm{pred} = \frac{1}{N_t p} \sum_{i=1}^{N_t}\sum_{h=1}^{p} \left( \hat y_i^{(h)} - y_i^{(h)} \right)^2

(MSE over detectors and prediction steps)

  • DDQN Loss: As above.

The modules are trained jointly, alternating between end-to-end DMF backpropagation and RL-IFSR policy updates.

Empirical results (Milton, Hurricane 2024):

RL-DMF achieves a 1-hour RMSE of 293.9 (95% accuracy) and a 6-hour RMSE of 426.4 (90% accuracy), outperforming LSTM, CNN-LSTM, and strong GCN-LSTM baselines, as well as single-graph DDQN-masked variants. Ablation studies show that attention-based multi-graph fusion reduces RMSE by 5–9 points compared to single-graph models, and RL-driven masking yields a further ∼22-point gain (Rafi et al., 10 Jan 2026).

Model RMSE MAE MAPE (%) R2R^2
LSTM 481.0 324.2 29.6 0.87
CNN-LSTM 537.1 357.5 28.5 0.84
Static GCN-LSTM 480.1 308.7 25.6 0.87
Dynamic-Dist GCN-LSTM 482.6 313.9 26.0 0.87
Dynamic-TT GCN-LSTM 474.9 310.2 26.2 0.88
RL-DMF (proposed) 426.4 281.1 25.2 0.90

5. Interpretability and Feature Importance

RL-IFSR offers a straightforward interpretability mechanism: the masking frequency fjf_j of each feature across training reflects its predictive importance. Features that are rarely masked are critical; those often masked are redundant or less useful.

Empirical ranking reveals the most important predictors for evacuation traffic forecasting to be:

  1. Recent traffic volumes,
  2. Previous-day and previous-period mean/variance,
  3. Day-of-week, hour-of-day, time-to-landfall indicators,
  4. Cumulative evacuated population.

Variables such as incident flags, lane closures, and road-type encodings exhibit higher masking frequencies, indicating lower relevance. This dynamic feature ranking aligns with domain expectations in evacuation scenarios (Rafi et al., 10 Jan 2026).

6. Implementation Details and Considerations

  • Graph backbones: Single-layer GCN per graph with hidden size H=64H=64.
  • Temporal modeling: LSTM hidden size H=64H=64, sequence length l=6l=6 hours, prediction horizon p=6p=6 hours.
  • Optimization: Adam (DMF, lr=10310^{-3}, batch=32); DDQN replay buffer of 20,000, batch=64, γ=0.99\gamma=0.99, ϵ\epsilon-decay 1.0\to0.05 at rate 0.995 per epoch.
  • Graph dynamics: The detector/node set VtV_t and edge sets EtkE_t^k at time tt reflect only online sensors with edges between currently active pairs. Adjacency matrices are normalized for GCN input at each time step.

This framework provides a template for general spatiotemporal multi-graph prediction tasks demanding both adaptability (via dynamic graph fusion) and explainability (via RL-based masking), with validated gains under realistic evacuation conditions (Rafi et al., 10 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reinforcement Learning-guided Dynamic Multi-Graph Fusion (RL-DMF).