Hierarchical Forecasting at Scale (2310.12809v2)
Abstract: Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.
- Optimal combination forecasts for hierarchical time series, Computational Statistics & Data Analysis 55 (2011) 2579–2589. doi:10.1016/j.csda.2011.03.006.
- Forecasting with temporal hierarchies, European Journal of Operational Research 262 (2017) 60–74. doi:10.1016/j.ejor.2017.02.046.
- Coherent Probabilistic Forecasting of Temporal Hierarchies, in: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 9362–9376.
- Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization, Journal of the American Statistical Association 114 (2019) 804–819. doi:10.1080/01621459.2018.1448825.
- End-to-End Learning of Coherent Probabilistic Forecasts for Hierarchical Time Series, in: Proceedings of the 38th International Conference on Machine Learning, PMLR, 2021, pp. 8832–8843.
- M5 accuracy competition: Results, findings, and conclusions, International Journal of Forecasting 38 (2022) 1346–1364. doi:10.1016/j.ijforecast.2021.11.013.
- LightGBM: A Highly Efficient Gradient Boosting Decision Tree, in: Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 3146–3154.
- Forecasting with trees, International Journal of Forecasting 38 (2022) 1473–1481. doi:10.1016/j.ijforecast.2021.10.004.
- Probabilistic demand forecasting at scale, Proceedings of the VLDB Endowment 10 (2017) 1694–1705. doi:10.14778/3137765.3137775.
- Deep Learning for Time Series Forecasting: Tutorial and Literature Survey, ACM Computing Surveys 55 (2023) 1–36. doi:10.1145/3533382. arXiv:2004.10240.
- T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco California USA, 2016, pp. 785–794. doi:10.1145/2939672.2939785.
- Fast computation of reconciled forecasts for hierarchical and grouped time series, Computational Statistics & Data Analysis 97 (2016) 16–32. doi:10.1016/j.csda.2015.11.007.
- Coherent Probabilistic Forecasts for Hierarchical Time Series, in: International Conference on Machine Learning, 2017, pp. 3348–3357.
- S. Ben Taieb, B. Koo, Regularized Regression for Hierarchical Forecasting Without Unbiasedness Conditions, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, Anchorage AK USA, 2019, pp. 1337–1347. doi:10.1145/3292500.3330976.
- S. Ben Taieb, Sparse and Smooth Adjustments for Coherent Forecasts in Temporal Aggregation of Time Series, in: Proceedings of the Time Series Workshop at NIPS 2016, PMLR, 2017, pp. 16–26.
- Forecast reconciliation: A review, International Journal of Forecasting In Press (2024).
- Forecast reconciliation: A geometric view with new insights on bias correction, International Journal of Forecasting 37 (2021) 343–359. doi:10.1016/j.ijforecast.2020.06.004.
- D. Girolimetto, T. Di Fonzo, Point and probabilistic forecast reconciliation for general linearly constrained multiple time series, Statistical Methods & Applications In Press (2023). doi:10.1007/s10260-023-00738-6. arXiv:2305.05330.
- Simultaneously Reconciled Quantile Forecasting of Hierarchically Related Time Series, in: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR, 2021, pp. 190–198.
- J. Schäfer, K. Strimmer, A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, Statistical Applications in Genetics and Molecular Biology 4 (2005). doi:10.2202/1544-6115.1175.
- A. Touloumis, Nonparametric Stein-type Shrinkage Covariance Matrix Estimators in High-Dimensional Settings, Computational Statistics & Data Analysis 83 (2015) 251–261. doi:10.1016/j.csda.2014.10.018. arXiv:1410.4726.
- The M5 competition: Background, organization, and implementation, International Journal of Forecasting (2021). doi:10.1016/j.ijforecast.2021.07.007.
- SciPy 1.0: Fundamental algorithms for scientific computing in Python, Nature Methods 17 (2020) 261–272. doi:10.1038/s41592-019-0686-2.
- G. E. P. Box, D. A. Pierce, Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models, Journal of the American Statistical Association 65 (1970) 1509–1526. doi:10.2307/2284333. arXiv:2284333.
- V. Assimakopoulos, K. Nikolopoulos, The theta model: A decomposition approach to forecasting, International Journal of Forecasting 16 (2000) 521–530. doi:10.1016/S0169-2070(00)00066-2.
- J. D. Croston, Forecasting and Stock Control for Intermittent Demands, Operational Research Quarterly (1970-1977) 23 (1972) 289–303. doi:10.2307/3007885. arXiv:3007885.
- Temporal Fusion Transformers for interpretable multi-horizon time series forecasting, International Journal of Forecasting 37 (2021) 1748–1764. doi:10.1016/j.ijforecast.2021.03.012.
- Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting, in: Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 5244–5254.
- Optuna: A Next-generation Hyperparameter Optimization Framework, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 2623–2631. doi:10.1145/3292500.3330701.
- StatsForecast: Lightning fast forecasting with statistical and econometric models., in: PyCon, Salt Lake City, USA, 2022.
- Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 1510–1520. doi:10.1145/3447548.3467278.
- Probabilistic Forecasting: A Level-Set Approach, in: Advances in Neural Information Processing Systems, volume 34, Curran Associates, Inc., 2021, pp. 6404–6416.
- Conformal Time-series Forecasting, in: Advances in Neural Information Processing Systems, volume 34, Curran Associates, Inc., 2021, pp. 6216–6228.