- The paper introduces a framework that integrates risk-sensitive scoring with regime-aware specialist routing to improve ETF volatility forecasts.
- It employs a four-stage pipeline with dynamic gating and a conditional HAR-RV floor to mitigate underprediction during volatile market conditions.
- Empirical results show up to a 24% reduction in median QLIKE loss during high-volatility regimes across major ETFs.
Risk-Sensitive Specialist Routing for Volatility Forecasting
Introduction
This paper presents a novel framework for next-day volatility forecasting of Exchange-Traded Funds (ETFs) that leverages risk-sensitive, regime-aware specialist routing to dynamically adapt model selection and forecast combination as market states evolve. The motivation stems from the empirical instability of model performance under shifting financial regimes, where no single model remains optimal across calm and stressed conditions. The framework posits that integrating regime-dependent, risk-adjusted online scoring with state-dependent specialist selection yields significant improvements, particularly during periods of market stress.
Methodology
The framework operates via a four-stage pipeline, starting with input and model-pool specification, followed by online risk-sensitive specialist scoring, state-dependent routing, and multi-branch forecast combination. The input space integrates both ETF-based and macro-financial covariates. The realized variance for each ETF is forecasted using a model pool comprising HAR-RV, GARCH-t, FIGARCH, GRU, and XGBoost, with outputs evaluated on a rolling, walk-forward protocol.
A central feature is the risk-sensitive loss function, which extends standard QLIKE with an explicit underprediction penalty, sharply penalizing severe underestimation of realized volatility. Model performance is measured relative to the contemporaneous best model. State-dependent routing is performed using an interpretable, regime-aware gate that exploits observable market-state vectors, eschewing black-box mixture-of-expert architectures in favor of specialist pools tailored to calm and stressed regimes.
Figure 1: Risk-sensitive specialist routing architecture, including scoring, gating, and multi-path forecast combination mechanisms.
Regime conditions control the routing set by blending local (regime-specific) and global performance quantiles, allowing the selection threshold to dynamically adapt as the effective sample size changes. The routing layer constructs calm and stress branches, each with prespecified specialist pools. The final forecasts result from a double gating procedure that blends branch outputs using stress scores derived from a linear index of state variables pushed through a logistic function. An additional stability mechanism applies a conditional HAR-RV lower bound (“HAR floor”) when stress or forecast disagreement exceeds thresholds.
Experimental Design
The evaluation utilizes daily data for six major ETFs—SPY, QQQ, IWM, EEM, GLD, and TLT—from 2015 to 2025, with rolling windows (504 trading days for estimation, 252 for benchmarking) and sequential forecast issuance. All methods are assessed under identical protocols, ensuring fair comparison.
Benchmark comparators include both static single-model and adaptive approaches (static/rolling best, VIX-switch). Performance is measured via median QLIKE and underprediction loss across full-sample, low-, mid-, and high-volatility regimes. Routing diagnostics include branch usage rates, selected regret, and miss-best rates.
Empirical Results
Regime-Dependent Specialist Superiority
Model dominance is regime-sensitive. GRU prevails in low-volatility intervals for all assets, while GARCH-t and FIGARCH outperform during high-volatility states, and GRU performance degrades significantly under stress. This nonstationarity confirms that a universal “best” model does not exist across regimes.
Figure 2: Regime-conditioned winner heatmap, demonstrating the shift in top-performing methods by volatility regime.
Routing Robustness and Stress Adaptation
The proposed routing architecture consistently reduces high-volatility forecast loss. In the high-volatility regime, relative to rolling-best, median QLIKE is reduced by approximately 24% and underprediction loss by roughly 22%. These improvements are evident across assets, particularly for equity ETFs (IWM, QQQ, SPY) and TLT, where baseline methods frequently fail due to repeated, severe underforecasting.
Figure 3: TLT robustness comparison (log scale), demonstrating the routing framework’s ability to avoid catastrophic misspecification affecting standard approaches.
Routing diagnostics reveal increased stress-branch usage in stressed periods, with calm-branch contributions remaining non-trivial for stability. Selected regret remains contained, despite miss-best rates exceeding 60%, affirming that the routing set’s value lies in excluding underperforming models rather than pinpointing the per-date best.
Importance of Risk Sensitivity
Ablation studies indicate that omitting risk-sensitive scoring or the HAR floor markedly degrades performance, especially for overall and tail underprediction losses and the upper decile of realized volatilities. The underprediction penalty is critical for robustness against severe adverse shocks.
Cross-Asset Comparison and Statistical Significance
The routing approach is consistently competitive or superior across the asset panel, except in extremely quiet market states, where GRU or VIX-switch can suffice. In high-volatility and aggregate measures, the specialist routing remains dominant. Diebold–Mariano tests find statistical support for the routing forecast’s superiority—especially against the naive VIX-switch rule—across most assets.
Median QLIKE differences by regime exhibit asymmetric gains: relative to rolling-best and HAR-RV, the routing method confers slight disadvantages in low-vol regimes but strong improvements in high-vol intervals, with overall superiority.
Figure 4: Cross-asset median QLIKE differences by regime, showing the routing forecast’s relative gains as volatility increases.
Implications and Future Directions
This research demonstrates that regime-conditioned specialist routing produces adaptive, robust forecasts for ETF volatility. The explicit combination of risk-sensitive, excess-loss scoring and interpretable, regime-aware gating outperforms both black-box and naive switching rules under market stress. Practically, the framework mitigates catastrophic model failures and provides an effective operational structure for volatility prediction in real-world settings where model stability and stress robustness are critical.
Theoretically, the results question the value of universal model selection for nonstationary financial series and advocate for dynamic, regime-cognizant adaptive architectures. The framework’s transparency enables diagnostic interpretability and auditability, necessary for risk-sensitive applications in portfolio construction or macroprudential monitoring.
Future research should extend specialist routing to richer volatility targets (e.g., realized semivariances, tail risk metrics), experiment with data-driven or latent state representations, and scale to broader cross-asset panels. The regime-conditioning approach’s integration with hierarchical mixture models and online uncertainty quantification also warrants further investigation.
Conclusion
Risk-sensitive specialist routing for ETF volatility forecasting provides a principled, adaptive solution for the challenges posed by regime-dependent model instability. Its capacity to blend interpretability, robustness under stress, and operational feasibility situates it as a valuable methodology for both academic study and industry deployment in risk management and forecasting pipelines.
For additional details and code, consult (2604.10402).