RL-DMF: Dynamic Multi-Graph Fusion
- RL-DMF is a framework that adaptively fuses multi-graph data using RL-driven feature selection for enhanced spatiotemporal predictions.
- It combines dynamic graph construction with reinforcement learning to intelligently mask less critical features, improving prediction accuracy.
- Empirical results indicate significant RMSE improvements and clear feature rankings, providing both precise forecasting and interpretability.
Reinforcement Learning-guided Dynamic Multi-Graph Fusion (RL-DMF) is a framework designed to adaptively and interpretably aggregate heterogeneous graph-structured information for spatiotemporal prediction tasks. RL-DMF integrates reinforcement learning (RL) for feature selection and ranking with dynamic multi-graph attention fusion, offering both predictive accuracy and post hoc interpretability, especially in scenarios such as evacuation traffic forecasting (Rafi et al., 10 Jan 2026).
1. Architecture and Workflow
RL-DMF comprises two tightly coupled modules: (i) an RL-based intelligent feature selection and ranking module (RL-IFSR), and (ii) a dynamic multi-graph fusion module (DMF). The workflow proceeds as follows:
- Inputs:
- Temporal feature tensors representing the time series of each detector over the previous time steps.
- Static spatial feature tensors for each detector.
- RL-IFSR: Observes the state , selects a single feature index for masking, and receives reward (negative prediction loss).
- Feature Masking: The selected feature is zeroed in the current batch via binary masks and .
- Dynamic Multi-Graph Construction & Fusion: At each time step, two dynamic graphs are built to represent different spatial relations: a distance-based graph and a travel-time-based graph . Node features, after RL-masking, are processed by individual single-layer GCNs, yielding and , which are then fused by node-wise attention into a single .
- Temporal Modeling and Prediction: The fused node embeddings are passed through an LSTM and a linear prediction layer to yield final predictions .
- Training: At each mini-batch iteration, RL-driven masking and feature selection is performed, DMF computes predictions, MSE loss is backpropagated, and the RL module receives the negative loss as reward to update its policy.
This joint optimization encourages both robust spatiotemporal modeling and interpretability by revealing which features are deemed non-critical by the RL agent (Rafi et al., 10 Jan 2026).
2. Graph Construction and Fusion Mechanism
At the core of DMF are multiple graph representations of the sensor network:
- Graph Construction: For every time , let denote the set of currently active detectors. Two graphs are instantiated:
- , with for edge , representing pairwise geographic distances.
- , with for edge , where encodes real-time travel-time (distance over average velocity).
Both adjacency matrices are min–max normalized to and symmetrically normalized: for downstream GCN use.
- GCN and Attention Fusion: Node features are updated via
The two GCN output tensors are stacked, and learnable attention vectors compute node-specific fusion weights,
The final node embedding is
capturing both static and dynamic spatial dependencies (Rafi et al., 10 Jan 2026).
3. RL-Based Intelligent Feature Selection and Ranking (RL-IFSR)
RL-IFSR uses a Double DQN (DDQN) to learn a dynamic feature masking policy:
- State space: Compact feature vector summarizing mean temporal and spatial features for the current batch.
- Action space: Indexing over all possible features; each action masks a single feature.
- Reward: Negative prediction MSE, thus incentivizing policies that retain informative features.
- Policy training: DDQN with prioritized replay and -greedy exploration:
Over training, features more often masked are inferred to be less critical for prediction, providing a means to rank feature importance (Rafi et al., 10 Jan 2026).
4. Training Objectives and Empirical Performance
The RL-DMF architecture is optimized via two objectives:
- Traffic Prediction Loss:
(MSE over detectors and prediction steps)
- DDQN Loss: As above.
The modules are trained jointly, alternating between end-to-end DMF backpropagation and RL-IFSR policy updates.
Empirical results (Milton, Hurricane 2024):
RL-DMF achieves a 1-hour RMSE of 293.9 (95% accuracy) and a 6-hour RMSE of 426.4 (90% accuracy), outperforming LSTM, CNN-LSTM, and strong GCN-LSTM baselines, as well as single-graph DDQN-masked variants. Ablation studies show that attention-based multi-graph fusion reduces RMSE by 5–9 points compared to single-graph models, and RL-driven masking yields a further ∼22-point gain (Rafi et al., 10 Jan 2026).
| Model | RMSE | MAE | MAPE (%) | |
|---|---|---|---|---|
| LSTM | 481.0 | 324.2 | 29.6 | 0.87 |
| CNN-LSTM | 537.1 | 357.5 | 28.5 | 0.84 |
| Static GCN-LSTM | 480.1 | 308.7 | 25.6 | 0.87 |
| Dynamic-Dist GCN-LSTM | 482.6 | 313.9 | 26.0 | 0.87 |
| Dynamic-TT GCN-LSTM | 474.9 | 310.2 | 26.2 | 0.88 |
| RL-DMF (proposed) | 426.4 | 281.1 | 25.2 | 0.90 |
5. Interpretability and Feature Importance
RL-IFSR offers a straightforward interpretability mechanism: the masking frequency of each feature across training reflects its predictive importance. Features that are rarely masked are critical; those often masked are redundant or less useful.
Empirical ranking reveals the most important predictors for evacuation traffic forecasting to be:
- Recent traffic volumes,
- Previous-day and previous-period mean/variance,
- Day-of-week, hour-of-day, time-to-landfall indicators,
- Cumulative evacuated population.
Variables such as incident flags, lane closures, and road-type encodings exhibit higher masking frequencies, indicating lower relevance. This dynamic feature ranking aligns with domain expectations in evacuation scenarios (Rafi et al., 10 Jan 2026).
6. Implementation Details and Considerations
- Graph backbones: Single-layer GCN per graph with hidden size .
- Temporal modeling: LSTM hidden size , sequence length hours, prediction horizon hours.
- Optimization: Adam (DMF, lr=, batch=32); DDQN replay buffer of 20,000, batch=64, , -decay 1.00.05 at rate 0.995 per epoch.
- Graph dynamics: The detector/node set and edge sets at time reflect only online sensors with edges between currently active pairs. Adjacency matrices are normalized for GCN input at each time step.
This framework provides a template for general spatiotemporal multi-graph prediction tasks demanding both adaptability (via dynamic graph fusion) and explainability (via RL-based masking), with validated gains under realistic evacuation conditions (Rafi et al., 10 Jan 2026).