Interpreting conflicting rankings of volatility forecasts out of sample

Ascertain robust and interpretable evaluation frameworks for out-of-sample realized variance forecasting that reconcile conflicting rankings across different loss functions (RMSE, MAE, QLIKE, MZ-regression R^2) and aggregation schemes (average firm, average cross-section, pooled panel).

Background

Using multiple loss functions and aggregation approaches, the authors find that different volatility forecasting models are ranked inconsistently across panels and metrics, reflecting known ambiguities in volatility forecast evaluation. This inconsistency hampers clear model selection and motivates improved interpretive methods.

References

There remains open questions as to how to properly interpret these results.

Predicting Realized Variance Out of Sample: Can Anything Beat The Benchmark? (2506.07928 - Pollok, 9 Jun 2025) in Section 6.1, Volatility Forecast Rankings