- The paper introduces a hybrid framework that integrates patch-based temporal encoding, edge-aware graph attention, and dual-head multi-task learning for proactive delay prediction.
- It achieves superior performance (F1 = 0.8762, AUC-ROC = 0.9773) with low variance compared to conventional tabular models, LSTM, and standard GNN approaches.
- The framework enhances logistics risk management by delivering robust, interpretable predictions and actionable insights through effective use of node and edge features.
EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks
Problem Setting and Motivations
Delivery delay prediction in modern logistics networks involves highly dynamic, multivariate data generated across geographically dispersed nodes (e.g., warehouses) and connecting edges (e.g., freight lanes). Classical predictive approaches, predominantly built on tabular ML models or node-wise time-series analysis, ignore crucial relational context embedded in the logistics graph topology. Existing spatiotemporal GNN-based methodologies have yet to offer robust, low-variance performance in the context of supply chains, especially where Service Level Agreement (SLA) violations are rare and costly. EAGLE directly targets these gaps, proposing a hybrid framework for predictive risk monitoring that explicitly integrates temporal encoding, edge-aware attention, and end-to-end multi-task learning.
Architecture Overview
EAGLE's architecture consists of three main sequential modules: a temporal encoder (PatchTST-Lite), an Edge-Aware Graph Attention Network (E-GAT), and a dual-head prediction module. This design decouples the modeling of temporal dynamics from spatial dependencies and enables dedicated channels for both node- and edge-level information to flow through the learning pipeline.
Figure 1: The EAGLE pipeline: PatchTST-Lite encodes per-node temporal history, E-GAT aggregates neighborhood context with edge-aware attention, and dual MLP heads produce violation probability and delay magnitude.
Temporal Encoding via PatchTST-Lite
PatchTST-Lite tokenizes a 14-step per-node sequence into two non-overlapping 7-day patches, encoding each with a two-layer Transformer with 64-dimensional model size and two heads. Temporal mean pooling yields robust node embeddings invariant to short-term noise and capable of capturing relevant weekly seasonality and order-flow cycles.
Edge-Aware Spatial Interaction Modeling
To model the heterogeneous risk of shipping lanes, E-GAT modifies conventional GAT by incorporating static or slowly-varying edge features directly into the attention mechanism. The concatenation of source and target node features with differentiable edge information ensures that the learned attention coefficients accurately reflect lane-level operational context, such as historic transit times and shipping mode distributions.
Dual-Head Multi-Task Prediction
Separate MLP heads produce predictions for binary SLA violation (yclass​) and continuous delay magnitude (yreg​). A weighted sum of BCE (0.7) and Huber (0.3) losses enables joint optimization, promoting richer representations and improved regularization via the regression auxiliary signal.
Methodology: Labeling and Training Protocol
EAGLE employs a temporal label protocol that prevents leakage: node features are extracted from a current window while labels are strictly computed from the disjoint, future window. A relative-labeling scheme—defining positive violations by window-average delay surpassing each node’s historical baseline—yields a healthy, temporally robust positive rate (≈6.2%) from severely imbalanced raw data. Training is stabilized with prior-informed bias initialization, class-weighted loss, and multi-task gradients. Node-wise features are drawn exclusively from order-level attributes known at placement time, with edge-level features derived from static (non-outcome) statistics.
Experimental Validation
Main Results and Baselines
Evaluated on the DataCo Smart Supply Chain dataset (46 nodes, 1478 edges, 0.15M+ orders), EAGLE consistently surpasses all baseline families: tabular (XGBoost, RF), temporal-only (LSTM), and traditional graph-based (GAT without temporal encoding or edge features). The following F1 and AUC-ROC numbers highlight the empirical advance:
- XGBoost: F1 = 0.6379, AUC-ROC = 0.7455
- Random Forest: F1 = 0.6052, AUC-ROC = 0.7249
- LSTM (no graph): F1 = 0.8095, AUC-ROC = 0.9679
- GAT (no temporal, no edge): F1 = 0.6142, AUC-ROC = 0.6872
- EAGLE: F1 = 0.8762, AUC-ROC = 0.9773 (std = 0.0089 over 4 seeds)
These results underscore that incorporating both temporal sequence modeling and edge-aware spatial aggregation is necessary and synergistic for effective logistics outcome prediction.
Training Stability
Independent runs demonstrate highly stable, low-variance convergence for EAGLE: std(F1) = 0.0089, a 3.8× reduction relative to the closest ablation (std = 0.0338 for GAT without edge features). Notably, the standalone GAT exhibits poor reproducibility (std = 0.0789), supporting the crucial role of temporal grounding and multi-task regularization for practical deployment.
Figure 2: Training loss and validation AUC-ROC across 4 seeds confirm smooth and stable convergence, with validation performance consistently above 0.97 AUC-ROC.
Ablation Analysis
- No Temporal Encoder: Loss of PatchTST-Lite causes the largest performance collapse (ΔF1 = -0.191, std = 0.0589).
- No Edge Features: Strips E-GAT to traditional GAT, reducing F1 by -0.074 and tripling the performance variance.
- No Regression Head: Eliminating the auxiliary regression task gives a notable F1 drop (-0.053) and higher MAE for delay estimation.
Temporal encoding is the overwhelming driver of both accuracy and stability; edge-aware attention and auxiliary regression further enhance both metrics, especially under class imbalance.
Explainability and Risk Attribution
EAGLE supports dual interpretability: structural risk mapping via aggregated E-GAT attention aligns with domain heuristics, pinpointing central nodes as high-risk hubs, while SHAP-based nodewise ranking enables actionable attribution for each individual prediction.
Figure 3: Supply chain risk heatmap; the central high-volume nodes (e.g., 10, 22, 17) are consistently flagged as major risk sources, in line with domain expectations.
Data Integrity and Leakage Prevention
The experimental protocol is transparent on all sources of potential label leakage—direct, algebraic, and temporal—and effectively eliminates them with careful feature removal and strict non-overlapping label windows. Edge features, by design, do not leak outcome information as they are computed from scheduled (not delivered) attributes, forming a robust, fair benchmark for comparative evaluation.
Theoretical and Practical Implications
EAGLE demonstrates that explicit decoupling of temporal and structural modeling is empirically and algorithmically superior to monolithic GNN or sequence-only approaches in the context of smart logistics. From a theory perspective, temporal regularization acts as a powerful stabilizer for graph-based architectures under low-label-rate regimes. Practically, the framework is deployable as an early-warning layer in IoT-enabled supply chains, enabling network operators to diagnose and mitigate emergent SLA violation risks with actionable, node-targeted heatmaps.
The approach raises two principal considerations for future research:
- Scalability: Current design, while effective on moderate graphs, may necessitate architectural or training adaptations at real-world scale; e.g., attention-efficient GNNs or streaming inference solutions.
- Generalizability: Although validated on DataCo, the framework’s transferability to supply chains with substantially different topology or historic order distributions remains to be fully quantified.
Conclusion
EAGLE achieves a marked improvement in proactive delivery delay prediction for smart logistics networks by seamlessly integrating patch-based temporal modeling, edge-feature sensitive spatial attention, and multi-task regularization. The resulting framework delivers robust predictive accuracy (F1 = 0.8762, AUC-ROC = 0.9773) with suppressed training variance, outperforming established temporal and graph-based methods. The empirical findings support further investigation into large-scale supply chain graph modeling with temporal and relational hybridization as key architectural principles.